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(57) The method and system for TV user profile data 
prediction and modeling allows accurate and namowly 
focused behavioral clustering. A client-side system clas- 
sifies television consumers into representative user pro- 
files. The profiles target Individual user advertising and 
program preference category groups. A contextual be- 
havioral profiling system detenmlnes the user's monitor 
behavior and content preferences, and the system may 
be continually updated with user Infomiatlon. A behav- 
ioral model database is queried by various system mod- 
ules. The programming, including targeted advertising 
for television and interactive television is based on the 
profile data prediction, modeling and preference deter- 
mination. The system is enabled to present a complete 
program sequence to the viewer based on the prefer- 
ence determination and stored programming. The latter 
is referred to as automatic program sequence (virtual 
channel) creation and the virtual channel can be pre- 
sented as e separate channel in an electronic program- 
ming guide (EPG). 
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Description 

CROSS-REFERENCE TO RELATED APPLICATIONS 

5 [0001] "!"h!S application claims the benefit under 35 b.S.C. § '19 ^e^ of provisioral application No. 60/260.745. filed 
January 9. 2001 . 

[0002] Further "eference is had to the disclosures found in the commonly assigned, concurrently filed, copending 
patent applcatior No. [Attorney DocketNo. fv:ET1 .0002]: application No. 09/893.192. describing a system andmethcd 
for delivery of television programs and targeted oe-coupied advertising: aophcation No. 09/096.592 entitled "Television 
10 Program Recording with User Preference Determination;" and application No. 09/953.327 describing logic operators 
for delivery of targeted programming, and SQL query operators for targeting expressions. The disclosures of the co- 
pending applications are herewith incorporated by '"eference. 

BACKGROUND OF THE INVENTION 

15 

FIELD OF THE INVENTION 

[0003] The invention lies in the field of interactive television programming. Specifically, the invention pertains to a 
method and system for TV user profile data prediction and modeling, to a method and system for program and/or 
advertisement program preference determination, to a method and system for targeted advertising for television and 
interactive television based on the profile data prediction, modeling and preference determination, and to a method 
and system with which a complete program sequence can be presented to the viewer based on the preference deter- 
mination and stored programming. The latter will be referred to as automatic program sequence (Virtual Channel) 
creation and the virtual channel will be presented as a separate channel in the electronic programming guide (EPG). 

25 

DESCRIPTION OF THE PRIOR ART 

[0004] Systems and methods to target advertising in interactive television are known. The prior art systems and 
methods generally target advertising through a statistically sampled, program driven mechanism. Advertising fortele- 

30 vision Is prbed in accordance with the rating of a certain program and time slot. Advertisements must be placed so 
that they reach the Intended target audience. The more audience a certain program delivers, and the more clearly 
focused that audience Is with regard to the demographic Information, the higher the price for placing the advertisement. 
By far the most popular TV ratings system currently in use In the United States is Nielsen Media Research, The Nielsen 
ratings and share system is based on a 5000 member national sample and approximately 50 local market samples. 

35 The information gleaned from the national sample is based on a measurement of which program Is watched at a certain 
time in a given television household and by which members of the household. The latter information is determined via 
so-called People Meters that are installed in the sample households and via which the viewers indicate when they are 
watching TV at a certain time by pushing a button individually assigned to them. The national sample utilizes rather 
crude demographic information to define preference ratings for the program determination. The results are published 

40 via ratings that are defined relative to the statistical universe (e.g., all television households, male 20 to 40 years, etc.) 
and by shares. The latter represent a percentage of the universe members watching a given program at the time of its 
broadcast. 

[0005] A slightly more accurate system, referred to as the Portable People Meter, is currently being tested in a limited 
local television market by Arbitron. The Portable People Meter is a pager-sized electronic transceiver that records a 

45 person's television usage via inaudible codes that are superimposed on television programs. At the end of the day, the 
transceiver is placed on a base station, from which the recorded information is then sent to a central data processing 
facility. 

[0006] In the context of TV user profile data prediction and modeling, the prior art methods and systems do not use 
program arrival and departure frequency and click timing as preference indicators. Preference ratings in the conte)ct 
so of programming predictions are thus rather rudimentaiy. Since prior art systems do not mode, transitions, sequential 
program behavior, and temporal program utilization in a general predictive architecture, they are unable to predict a 
user's preference based on sophisticated content and temporal relationships. 

[0007] By not assessing when there is adequate evidence to infer a preference known methods tend to incorrectly 

predict user preferences, or they may wait too long before building higher confidence. Known classification methods 
55 require that all feature dimensions of a sample be correlated to the observation, and then assume a Gaussian distri- 
bution parameter.zation to describe group clusters However this is inaccurate as the data are not generally subject 
to normal distribution. 

[0008] in the context of program or advertising programi preference determination, the prior art methods co not have 
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«n autonaiic user input and thus no method of learning which nr,eirics best predict p. certain user's preference. Fur.hen 
if preference ratings are available for a given demographic group, they are only stationarily weightec and no dynam.c 

weiahtina adjustment is effected. . . . . . ^„,, 

r00091 in the context of targeted advertising for television and interactive television, the pnor art me hods pnnapaHy 
u^e demographic infomiation. not contextual behavioral information as pan of the user targeting pro,.le. . h.s reauces 
targeting performance in non:demographioally classifiable consumer groups, and demograph.c m.ernng accuracy 

SUMMARY OF THE INVENTION 

roOl 01 It is accordingly an object of the Invention to provide a system and method for behavioral model clustering in 
Vv usage and targeted advertising and preference programming, which overcomes the above-ment.oneo disadvan- 
taaes of the heretofore-known devices and methods of this general type. 

[001 1] With the foregoing and other objects in view there is provided, In accordance with the invention, a television 
rating system for targeted program delivery, comprising: 

a clustering engine receiving television viewing data input, processing the viewing data input, and generating user 
profiles targeting advertising category groups; 

a client-side system adapted to classify a television user into at least one advertising category group; 

a contextual behavioral profiling system connected to the client-side system and detemilning a television user's 
viewing behavior with content and usage-related preferences; and 

a behavioral model database connected to the profiling system and storing therein information with the television 
user's viewing behavior. 

[00121 In accordance with an added feature of the invention, the clustering engine is a software agent residing in a 
central computer system at a television distribution head-end and is programmed to create template behavioral profiles 
corresponding to targeted advertising categories of television viewers. 

[001 3] in accordance with an additional feature of the Invention, the clustering engine is trained substantially exclu- 
sively on tagged viewing data from a given target group to learn a most general profile of the given ^^9^ 9;;°"P^ 
[00141 inaccordancewithanotherfeatureoftheinventlon.theclusteringenginelsprogrammedtogeneralizeviewe^s 

profiles in each group into a representative aggregation for a respective advertising cate^orv, and to ^dv^rtsing 
category profiles by aggregating all dimensions most strongly in common for the given group and most unique across 

raS 51^'°n?ccordance with a further feature of the invention, there is provided an advertisement manager connected 
to query the behavioral model database. The advertisement manager is programmed to parameter^e beh«;;'°7' P^^^ 
f nes of The behavioral model database and to download the parameterized behavioral profiles to an advertising catego^ 
membership agent residing at the client-side system. Preferably, the advertising category membership agent is con- 
fTguTed 0 reconstruct the downloaded parameterized targeting models, and apply a clustering engine to th« t«,ev>sio^ 
usei^s history to determine a most likely advertising category the user belongs to and store '^'^^^f "^J^^^^^^^^^ 
category probabilities in a user category database. Further, there may be provded targeting agents and Presentation 
aqents disposed at the client-side system for combining the targeting category probabilities and relevant preference 
information to selectively capture, store, and display advertisements downloaded in accordance with the °Pt«^'^at>pn^ 
OOlTl With the above and other objects in view, there is also provided, in accordance with the invention^ a P efe ence 
engine for use in an interactive display system with a head-end side distributing program content and a client side 
Sv!ng the program content and selectively displaying the program content in accordance with a user's selection. 
The preference engine detennines the user's preferred program content and includes: 

a user monitoring device connected at the client side to record contextual transition behaviors profiling one or more 
users and to continually build a knowledgebase of preferences and contextual transition behaviors profiling the 
one or more users: and 

_ •'.-it rN^*s^rHpf-./^p w'^h tho user's demcaraohic 

£ device for providing to the one or more users the progr<dm u^Mieni ... oC.-...-a..-e a..r .r y . 

information and with the contextual transition behavior profile. 

[001 7] The user monitoring device of the preference engine models the user's behavioral interaction with advertising 
prograrTi content and with entertainment program content. 
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[0018] \n accordance with agaip an added feature of the invention the preference engine is connected to receive 
(••orr the head-end rneiadaia describing advertising cortent anc metadata describing entenainment progrann content, 
ard progra-nmed to estab ish content Dreferences oy coTibip:rg metadata information witn the contextual transition 
behavio:' profile, and to build a relational knowledge base with associations between the user's behavior oerrog-'aphics. 

and program content preferences. ~he preference engine is programmea to model patterns of usage behaviors with 
a behavioral model and to extract key usage information from, the behavioral model into a behavoral database, wherein 
each en:ry .n tie behav oral database has a confidence value asscciatec therewith reflects an estim.ate of a structural 
and sampling quality of the data used to calculate the database entry. 

[0019] With the above and other objects in view there is also provided, in accordance with the invention, a system 
for targeted program delivery in a program content delivery system having a head-end side and a client side. The 
ta''getlng system comprises: 

£ central data system at the head-end side receiving viewing data selected from the group consisting of watch 
data^ watch start time, watch duration, and watch channel, demographic information describing a program user, 
and an electronic program guide with metadata describing a program content: 

a demographic cluster knowledge base acquirer receiving behavioral data of the user and outputting a knowledge 
base in form of a transition matrix with weight sets, the transition matrix predicting a demographic group of the 
user; and 

a program content generating module providing to the client side streams of program content including advertise- 
ments based on the predicted demographic group of the user 

[0020] In accordance with again an added feature of the Invention, there is provided a realtime feedback link for 
delivering to the central data system realtime information concerning a user's viewing behavior with click stream data. 
[0021] In accordance with again an additional feature of the invention, the demographic cluster knowledge base 
acquirer is based on a hidden Markov model. 

[0022] In accordance with again another feature of the invention, the demographic cluster knowledge base acquirer 
and the program content generating module are software modules each adapted to be stored on a machine-readable 
medium In the fomn of a plurality of processor-executable Instructions. 

[0023] In a preferred embodiment, the demographic cluster knowledge base acquirer generates demographic cluster 
information of the user In terms of statistical state machine transition models. The state machines are defined In the 
transition matrix, and the transition matrix contains information of program transitions initiated by the viewer 
[0024] Preferably, there are provided at least two concurrent transition matrices including a channel matrix and a 
genre matrix. Other matrices are possible as well, such as a title matrix, an actor matrix, and so on. 
[0025] In accordance with again a further feature of the invention, the demographic cluster knowledge base acquirer 
is configured to parameterize the user's behavior with a double random pseudo hidden Markov process, and to define 
a low-level statistical state machine modeling a behavioral cluster and a top-level statistical state machine with active 
behavioral clusters and an interaction between the active behavioral clusters. 

[0026] In accordance with a concomitant feature of the invention, the demographic cluster knowledge base acquirer 
is configured to define a double random process with a plurality of dimensions, and to determine parallel statistical 
state machine transition events in at least two of three state categories Including channel, genre, and title of the program 

content. 

[0027] The global profile represents demographic cluster information of the viewer in terms of the statistical state 
machine transition models. The invention provides for TV user profile data prediction and modeling: The resultant 
behavioral metrics tend to uniquely characterize individuals, and their preferences. The transition processes model 
user sequences and temporal transition preferences. The invention provides for a method to determine confidence in 
data quantity, and quality; for an algorithm to determine a distance between non-Gaussian, highly dimensional distri- 
butions; and a method to determine adequate separation between clusters for group membership classification. 
[0028] The query Interface according to the invention provide behavioral preference information to other system 
modules. 

[0029] The novol program or Ad program preference determination uses: 

B Weighted fuzzy voting preference metrics based on modeled usage context, content access timing, and content 
parameter sequencing. 

■ Frequency reinforced, non-linear preference metric vote weight learning architecture 



EP 1 223 757 A2 



■ A vote ftggregation atgoriihm that determines the top n content parameters (i.e., channels, genres actors, titles, 
etc.) by adiusling for vote to vote cuality and relative preference trends. 

[0030] In a further conceptual group, targeted advertising for TV and interactive T\' provides for: 

■ A training method to aggregate users in "he target category' 

■ A pruning technique to create the most representative user targeting category template and efficiently download 

it to the TV client system 

■ An efficient user targeting category membership detennination scheme 

■ Automatic Virtual Channel program sequence creation using stored preferred programming and presented as a 
channel in the EPG. 

[00311 Finally, there are provided algorithms to automatically place stored programs and Ads into a virtual channel's 
EPG (along side nomial EPG entries) according to the user's preferred context (i.e., time, sequence, etc.). 
[0032] The invention thus provides for a very accurate system of TV user profile data prediction and modeling. Prior 
art methods do not use program arrival and departure frequency and timings as preference indicators, thus they have 
less accurate preference ratings. Here, categories such as liked, unliked. and surfing conditions are modeled separately 
to better match a persons different behavioral meanings for each case. Prior art systems do not model transition, 
sequential and temporal in a general predictive architecture. Thus, they are unable to predict a use^s preference 
based on sophisticated content and temporal relationships. By not assessing when there is adequate evidence to infer 
a preference, known methods tend to incon-ectly predict user preferences, or they may wait too long before building 
higher confidence. 

[0033] Known classification methods require that all feature dimensions of a sample be correlated to the observation, 
and then assume a Gaussian distribution parameterization to describe group clusters. However, this is inaccurate as 
the data in not generally normally distributed. The present methods are able to detennine clusters separation distances 
of multi-modal (non-bell shaped) distributions, and saves memory by not preserving each sample point in feature space. 
Furthermore, prior art methods do not make optimal clusterclassification decisions when sample distributions are multi- 
modal. The system and method of the present invention make more appropriate group classifications as they work 
with any arbitrary distribution shape. 

[0034] Based on the superior and multi-faceted behavior modeling, the invention allows for accurate program or ad 
program preference determination. By including rich temporal and sequential context infomiation, the present system 
predicts a user's context dependent preferences. The invention utilizes automatic teaming methods, i.e., explicit user 
inputto best predict a certain user's preference. The present system dynamically adjusts preference prediction param- 
eters to use a higher weighting for the most predictive features in rating a content parameter. 

[0035] The present system influences preference ratings with sample-to-sample rating trends that prior art systems 
simply aggregate. By increasing (decreasing) a rating with better (lower) quality samples, a more accurate relative 
preference metric is achieved. By better modeling preference behavior, therefore, the invention enables far supenor 

advertising and TV program targeting. 

[0036] In the context of targeted advertising for TV and interactive TV, the prior art methods principally use demo- 
graphic information, not contextual behavioral information as part of the user Ad targeting profile. Accordingly, targeting 
performance is reduced in non-demographically classifiable customer groups, and demographic inferring accuracy 
[0037] The present system reduces profile size by using a less conservative statistical significance metnc, thus further 
reducing targeting template size, while preserving classification performance, by not downloading statistically errone- 
ous profile information 

By interring a TV user's targeting category membership as confidence derated distances from simple local templates, 
the present method achieves very accurate proportional membership likelihoods because, in contrast with the prior 
art template profiles are not wrongly parameterized in statistical terms. 

[0038] In addition to accurately classifying the viewers preferences for targeted advertising, the invention further 
enables the automatic creation of suggested program sequences. Here, we refer to a virtual channel program sequence 
creation using stored preferred programming. The suggested program sequences can be presented as a separate 
channel in the electronic programming guide (EPG). The virtual channel is superior to prior art systems in that the user 
experiences the virtual channel EPG with the same look and feel as any other channel, except the programs and 
showing times are placed as the user would more like it. The virtual channel provides a higher level of preferential 
programming that the simple listing of content that is available on the local storage. The novel system gives the TV 
user the feel of an 'on demand' channel. 
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[0039] !n system p.r.d biisiness inodGl te-r^.s ine prosert inveraion .s d.rectt?d ;o « iarcetoc advertising ;Ad) sysrem 
t^.at provides: 

■ An i'^ncvat've clustering necnanism to create and determine the most representative te.avision C^. refers to a 
Digiia! Television or Analog te'evision and Set Top Box receiver combination, both with progranr storage) jser 
profiles that best target indiviaual user advertising category groups. 

■ A cl'en:-side syster^ to classify a TV user into one. or more, advertising group categories. 

■ A contextuai behavioral profiling system that detenni^.es a user's TV usage and content related preferences. 

■ A behavioral model database that is queried by other system modules for user preferences, supporting behaviorally 
targeted Ads preferential virtual channel electronic programming guide (EPG) construction, preferential program 
storage, and automatic programming recommendations. 

[0040] The novel Ad targeting system inters a TV user's advertising category without requiring the viewer to explicitly 
enter the infonnation. An advertising category, herein, refers to a set of descriptive characteristics thai groups a subset 
of users into categories that can be correlated to a targeting Interest of advertisers. Traditionally, these categories have 
been based on demographic characteristics; however, the present invention expands user modeling, and targeting, to 
also include behavioral metrics. Thus, a much more robust, and refined Ad targeting system is possible. Apart from 
prior art, the present targeting system is not program data, but behavior data driven. The fundamental premise of this 
invention is that persons of a similar category will have certain behaviors that can be modeled and grouped with a 
significant degree of consistency. The primary underlying aspect of the invention is to develop an accurate model of 
the dynamic process, so that a clustering engine with a practical set of characteristic dimensions that can efficiently 
separate, or classify, the vast majority of viewers. In addition to automatically targeting advertising category members, 
the goal is to apply the behavioral modeling engine and database to determine a TV user's contextual preference for 
programming and Ads. 

[0041] The present invention models TV program viewing as a double random pseudo Hidden Markov process, 
where there is a hidden, low level, statistical state machine (SSM) modeling a behavioral cluster, and an observable 
top level SSM that infers the active behavioral clusters and the interaction between them. The system is trained with 
tagged learning data (e.g., real-time TV click stream data tagged with the demographic identity of viewers) of a statis- 
tically representative, TV viewing population sample. The classification model is a hybrid combination of a parameter- 
ized random process, heuristics, and several single dimension behavioral metrics. A multiplicity of data quality meas- 
ures detenmines the statistical significance of, and confidence in the training and test data. 

[0042] The present invention includes an innovative sample size confidence measure. This metric estimates the bias 
in the random process that drive the SSM, by calculating the ratio ot expected state transition coverage assuming state 
transitions were unifomi randomly chosen, to the actual number of different state transitions observed. The ratio rep- 
resents the state transition focus compared to random, and indicates the degree that there are enough samples to 
Infer a non-uniform random process, specifically a viewer's personality, as meaningfully determining the SSM structure. 
[0043] The double random process model has several dimensions to capture a wide variety of typical, but often 
unique, TV usage behaviors. In the preferred embodiment, each user's action, or selected non-actions, creates parallel 
SSM transition events in each of three state categories: Channel, Genre, and Title. These state categories are further 
subdivided into states of liked/unliked, and short_ viewing /not-short_ viewing characteristics. Inside of each categorical 
state machine described are chronological dimensions that model time sensitive state transitions. The temporal di- 
mensions of the preferred embodiment models transition event chronology using a novel strategy that includes 
day_of_wee/<, time^oCday, time_after_TV_turr}__ON, and tlme_$fncejast_cfiange. The JM user's program selection 
process, when observed through this time and transition sensitive model, detects complex usage patterns that tend to 
be unique to individuals, and more broadly to interesting classes of individuals. Behavioral sequences greater than 
one transition, such as channel surfing, and a multiplicity of heuristic distributions, such as session watch times, and 
psychometric parameters, such as genre curiosity, are used outside of the SSM as dimensions in a pseudo-Euclidean 
classification space. 

[0044] A novel clustering method combines the SSM transition models (using transition matrix parameterization 
techniques) and non-Gaussian parameter distributions (by defining unique histogram distribution distance measures) 
to determine user separability through a dimension voting architecture. Each dimension votes two clusters as separate 
if the mean separation distance between most of the points is greater than their separation variance. Surpassing a 
certain threshold number of dimensional separation votes determines if the clusters are separate. Tne percentage of 
the dimensions that are voted as not separable between two clusters approximates their amount of overiap. 
[0045] in accordance with an added *eature of the irvention. the clustering engine (CE) software agent resides in a 
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cioqular computer system centrally located ai the TV distribution head-end ;called Admanager) and creates .emplfi.e 
behavioral profiles that correspond to targeted Adcategorles of TV viewers. To learn ;he rr^si general profile of a 
paPicular target grouD. the CE is trained on only tagged viewing data from that croup. , he C= generalizes viewers 
profiles in each grouo into a representative aggregation for the respective Adtargeting categories. Aacaiegory pro.iles 
5 evolve by aqqreqating all dimensions most strongly in common for the group and most unique across target groups. 
r00461 In accordance with another feature of the invention, the prototypical Ad group category, behavioral profiles 
are innovatively oarameterized by the Ad and Ad program infonnation (metadata) distribution organizer part o. the Ad 
manager (called Ad server) to compress the largeting models for the bandwidth-efficient download to advertising ca.- 
eoorv membership agents (tyiemberAgent) residing in field T\'s. 
10 [00471 In accordance with a further feature of the invention, the field TV MemberAgents reconstruct the downloaded 
pararneterized targeting models, and use a similar CE applied to the TV user's history, created by the TV profiling agent 
(ProfAgent). to detemiine the most likely Ad categories the user belongs to and put the results ,n a user category 
database. TargetingAndStorage Agents and Presentation agents (PresAgent) in the TV ^°'"''i"^^'\^=V^?h!.'"ntimti 
egor^' probabilities, and other relevant information (preference info), to selectively capture, store, and display the optimal 
15 downloaded advertisements, including videos and banners, to the user. . ^ 

rO0481 The ProfAgenfs in the client or field TVs, continually build a knowledgebase of preferences and contextual 
transition behaviors that profile TV user(s) in the household. The ProfAgent models behavioral interaction with Ads 
and regular, or entertainment, programs the same way, with, however, possibly different state category names. Pref- 
erences for entertainment programs could include affinities for any metadata field or entries in an electronic program- 
me ming guide (EPG), such as titles, genres, channels, and actors. A transition event occurs between corresporiding 
progr^ EPG entries (e.g.. transttioning between programs with different channels and genres, creates a channel and 
genre transition accordingly. Ads have their version of EPG infomnation that is similar to regular programs. The system 
Teams a user's Ad transition preferences the same way it does for regular programs, except the Ad s genre ,s its 
product's Standard Industry Code (SIC), the Ad's title is the product's Universal Product Code (UPC) or SKU code. 
25 and the system considers the Ad's actor as the corporate sponsor. Thus the identical data stmctures and algorithms 
model user program and Ad transition behaviors. 

r00491 This infonnation is provided by the head-end in the Ad's metadata in the same way a program s Content 
nfomiation metadata and EPG precede the broadcast. Hence, the ProfAgent learns product and sponsorw^g company 
preference for Ads in the same way genre and actor preferences are learned, as described in detail herein. This enables 

30 the targeting of Ads to not only a user's inferred demographic, but to their specific product, corporate branding, or 
general product category interests. For example, through SIC the ProfAgent can leam if a user ikes financial services^ 
or automobile Ads. Similarly, a Pepsi Cola branding campaign could target users who like the soda SIC, or more 
specifically Coca Cola named Ads. In another aspect, using the UPC. the Gillette company could target users that 
spectfically liked Ads of a Remington model 3000 electric shaver. In yet another aspect of J^''9«»'"9 

35 agency could target users that, for example, likes Apple Computer Company commercials, but does not ot^^erwise like 
computer Ads. This user may be entertained by their Ads, but have no interest in their product. This could be an 
opportu nity for the Ad agency to focus an inf omercial Ad to the user, to bridge the user from brand awareness to product 

rOOSOr^Overtime. a vast relational knowledge base learns very valuable associations between user TV usage be- 
40 havior. demographics, programs, and Ad preferences. This knowledge base not only increases Ad ^^^getmg within the 
TV but also has a revenue generation potential by marketing the aggregated personal information to third parties^ 
[0051 1 in one instance of the present invention, a TV ProfAgent models patterns of TV usage behaviors with a be- 
havioral model (BM) similar to the clustering engine used at the TV head-end. and extracts key usage infomiation f ror^ 
the BM into a behavioral database. Each entry of the behavioral database has a confidence va'"e generated by a 
^5 multiplicity of novel techniques presented in detail herein. The database entry confidence registered by the ProfAgent 
reflects an estimate of the structural and sampling quality of the data used to cateulate the database entry. 
r00521 The TV receives Ad targeting metadata with restricting query terms to display the associated Ad only to se- 
lected usei^s with database entries matching the query constraints. Each Ad metadata query term has a minimum 
confidence threshold term that specifies the lowest confidence level in satisfying the query temr,. or terms, acceptable 

lS'%o?exImpS an Ad targeting constraint such as 'gender: Male@80% AND age:25-35@ 50%' has the effect 
of only showing the Ad to users the TASAgent prcdetemnined had at least 80% confidence in being a male, and at 
least 50% confidence in being between 25 and 35 years of age. 

[00541 In another aspect of confidence level specification, there is an expression level, confidence threshold as 
55 follows- '(gender: Male AND age:25-35)@80%'. This targeting mode selects for Ad display only users that the system 
has at least 80% confidence in being male and between 25 anc 35 years of age. These methods prov.oe flexibility by 
enabling Ads to specify the most important targeting selection terms, or to specify a range of people that are close 
enough to the desired targeting profile to show the Ad to. The TargetingAndStorage (TASAgent) only selects profiles 
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from the dpJabase whose aggregate per dimension corifidence rating satisfies :he query limits se: by the Ad targetirg 
Tietadata. 

[0055] in ye: another aspect of the confidence thresholding sys:err:. the query select on filter is stated as a Fuzzy 
Logic, and not Boolean, expression. The targeting query expression is similarto the probabilistic pexeniage confidence 
te-ms w:th iwc notable exceptions: fuzzy nembership literals replace the percentage terms, and a fuzzy literal :ab.e 
synchronizes client and server. 

[0056] By way of examp e. the query expression mode appears as follows: 

'gerder: Male@ VERY_SURE AND Age:25-35@ FAiRLY_SURE* 

[0057] This query would select users whom the TASAgent was very sure is a nale, and fairly sure lie between 25 
and 35 years of age. A fuzzy literal table (FLT) lists the allowable range of fuzzy memberships each advertising category 
may exhibit. An example of a fuzzy literal table (FLT) is: 

Mate: [UNSURE, FAIRLY_SURE,VERY_SURE] 

Age: [UNSURE, FAIRLY_SURE,VERY_SURE, CERTAIN] 

[0058] The advantage of the latter expression method is that the novice Ad agency only specifies the degree of 
confidence required in intuitive, non-mathematical, terms, and leaves the exact range of confidence percentages up 
to the TSAgent to decide, and continually optimize. Additionally, the fuzzy method handles the non-deterministic mean- 
ing of the percentage confidence terms in the database. The TASAgent learns the percentage confidence rating ranges 
historically associated with each fuzzy performance level. 

[0059] Other features which are considered as characteristic for the invention are set forth In the appended claims. 
[0060] Although the Invention is illustrated and described herein as embodied in a system and method for behavioral 
model clustering In TV usage and targeted advertising and preference programming, it is nevertheless not intended 
to be limited to the details shown, since various modifications and structural changes may be made therein without 
departing from the spirit of the invention and within the scope and range of equivalents of the claims. 
[0061] The construction of the invention, however together with additional objects and advantages thereof will be 
best understood from the following description of the specific embodiment when read in connection with the accom- 
panying drawings. 

Brief Description of the Drawing: 
[0062] 

Fig. 1 is a block diagram illustrating the most important modules of the system operator part of the system for 
program or ad targeting according to the invention; 

Fig. 2 is a block diagram of a behavioral cluster engine, fomiing a part of the system according to the invention; 

Fig. 3 is a diagrammatic overview over a hidden Markov model with double random processing; 

Fig. 4 is a diagram illustrating a statistical state machine with three state spaces represented in probability density 
functions; 

Fig. 5 is a diagram of an exemplary channel transition matrix representing a state machine; 

Fig. 6 is a block diagram of a targeting server representing an advertising category, behavioral prototype learning 
system; 

Fig. 7 is a block diagram expanding on the intra-profile pruning in the cluster aggregator section o' the targeting 
server of Fig. 6: 

Fig. S is a block diagram of a client-side advertising category classificalior. system: 

Fig. 9 is a block diagram of an ad targeting system snowing the server side and the client side: 

Fig. 10 is a blocK diagrammatic overview of a preference determination engine architecture: 
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Fig. 11 shows three graphs recording various non-surf program waiching ratios (with 1657 user inputs): 
Fig. -.2 are four graphs recording hopping behavior statistics for " 657 L-sers: 
Fig. 13 is a state sequence model; 

Fig. 14 are four graphs with typical user behavioral statistical distributions with weekly recording of 1 657 users: and 
Fig. 15 are four graphs illustrating various additional parameter distributions for the exemplary 1657 users. 
Detailed Description of the Preferred Embodiments: 

r00631 Referring now to the figures of the drawing in detail and first, particularly, to Fig. 1 thereof, there is shown a 
diaqrammatic overview of a system according to the invention. The core of the invention is the application. of a hidden 
Markov chain and user behavior statistics to model and to predict a TV viewers demographic group and/or the most 
popular behaviorfor an individual demographic group. The main goal is to predict a given viewer's demographic group 
and/or what programs the viewer would like to watch, and to improve the prediction and modeling accuracy as more 
realtime viewing data become available. 

r00641 The system provides two ways to predict a viewer's demographic group, namely, via a dynamic demographic 
cluster (DDC) knowledge base, and based on similarities between what a viewer watches and the virtual channels 
predicted by the PDM for the demographic groups. 

r0065] As noted above, the primary objects of the invention deal with the targeting of advertising content and program 
content to a viewer or group of viewers who meet certain demographic requirennents if such a requirement is given. 
[0066] The system depicted in Fig. 1 is separated into a head end and a client. Programming, ad content, and se- 
quencing of TV content is determined at the head end. The program stream information is transmitted to the client side 
in a multi-program stream. As indicated by the dashed line, returning from the receiver to the head end, a program 
selection feedback provides for realtime information regarding the client's viewing behavior. While the feedback con- 
nection is generally available In digital cable systems and other direct connection systems, the invention can also be 
implemented without the direct feedback. Details of the realtime feedback and the sampled feedback embodiments 

will emerge from the following description. . ... »■ ( 

roOSTl The data supplied by outside resources include infomnation conceming the viewing monitor information of all 
demographic groups which advertiser or content providers may be interested in. Those variables include (a) watch 
date (b) watch start time, (c) watch duration, (d) watch channel, and (e) the viewer's demographic infortnation such 
as age sex and the like. The input data further include the information of the incoming electronic program guide (tPC3).^ 
[0068]' The historical data play a role as a pre-knowledge of the demographic groups. These data define the viewers 
behavioral information. The system knowledge is limited to those demographic groups at the beginning. , 
[0069] The core of the invention - conceming the acquisition of data for the necessary behavioral mode - is he 
demographic cluster knowledge base acquirer based on the hidden Markov model. The input of the module is he 
behavioral data and, if available, the dick stream feedback. The output of the module is the knowledge base in the 
fonn of a transition matrix with weight sets that will be discussed in the following text. ....... 

rO07O] The invention furtherprovidesforimprovements in the modeling and prediction based on feedback infomiation 
which includes realtime behavioral data in the form of click streanns (e.g., remote control or TV set click sequences). 
[0071] Tuming now to various details of the novel system, Fig. 2 illustrates a pseudo Euclidean behavioral cluster 
enqine (BCE) architecture according to one embodiment of the present invention. A pseudo hidden Mart<ov nnodel 1 
(pHMM captures behavioral state transitions. An heuristic behavioral metrics (HBM) sub-block 2 algorithmlcally detects 
and statistically represents a multiplicity of predictive TV user characteristics. The TV users TV control stream 3, e.g 
remote control click stream, feeds into the pHMM and HBM. These blocks are parameterized into a highly dimensional 
classification space 5 delivering a spatial cluster of the training data to subsequent modules 6. 
[00721 Fig 3 is a diagram focusing on the pHMM block of Fig. 2. Several statistical state machines work in tandenn 
to model the user click stream. The preferred embodiment has multiple hidden, low level, behavioral processes, and 
a top-level user transition process. The hidden random processes include channel, genre, and title state spaces 
operating in parallel. The top-level random process, or statistical state machine (SSM). models the likelihood that 
certain behavioral process activations, and other heuristic behavioral factors, infer a particular user Each state space 
has s temporaiiy sensitive transition subspace that tracks various time-dependent user behaviors. 
[00731 The pseudo Euclidean Behavioral Clustering Engine (BCE) architecture of Fig. 2 represents one embodiment 
of the present invention. It includes a pseudo Hidden Markov Model (pHMM) to capture behavioral, state transition. In 
general, the 'pseudo' qualifier indicates this system departs from traditional definition, but maintains substantial simi- 
larities as enumerated after a brief description of the HMM . 



EP 1 223 757 A2 



[0074] A Hr/M .s r. double random process thnt has an jnoertyina rrtnaor-. p'ucess that is no: oDse'vable ana ihere- 
fcrc. niddcn. However, some aspects cf ihis hiaden process are obse-x^aofe through another ranoorr process or a set 
of randoT: processes. The observec random orocess produces a sequence of sy.T.bo.s. in the present case likely user 
categones. that we may measure with certain statistical properies. The mode! seeks to describe both the short tirre 
varia:iors in the random process, as well as the steady state features. Of particu.ar concern are the transitions from 
one interval to another. We generally assume that statist cal laws govern the observed temporal variat ons ir the TV 
viewing process. 

[0075] The goal of the BCE is to mode! and group the TV usage and content selection, time seres data oauerns 
generated by a TV remote control, or TV, buttons pressed, herein referred to as 'ciick-stream'. There are two forms of 
click-stream data- they are real-time and statistically sampled. 

■ Real-time data Is what the actual TV system registers from the user TV control commands. This data is sampled 
at a high rate, and can be any TV control button and it may be accumulated temporarily in a file for later use. 

■ Statistically sampled data are recorded by a third part, such as Nielsen or Arbitron, it generally has detailed user 
information, limited time resolution, and only logs channel changes of a statistically representative sub-group of 
the TV viewing population. 

[0076] Two primary aspects of the Ad targeting system directly depend on click-slream data. First, the profiling agent 
(Prof Agent) on the TV models TV users through a real-time click-stream and program content information or electronic 
programming guide (EPG) data. Second, the BCE uses statistically sampled click-stream data and corresponding past 
ERG data to build advertising category behavioral clusters. All statistically sampled TV user data in the present docu- 
ment, use digitally recorded market research by BARB TV Research corp. (London England) of 1657 British satellite 
TV users over a six month period in 1998. 

[0077] A plurality of heuristic measures estimate user preference for TV programs, categories of programming, and 
certain user behaviors. With these preference measures, a preference determination engine (PDE) uses a voting based, 
reinforced learning system to assign preference ratings to all EPG entries. 

[0078] The preference determination engine (PDE), the architecture of which is illustrated in Fig. 10, provides iikes 
predictors for all demographic groups, i.e., for a group instead of an individual person. The difference between a person 
and a group is that the detennination engine has to pick up the programs which will please a majority of the.people in 
a demographic group for a particular time. Additional details of the implementation of the PDE may be found in the 
commonly assigned, copending patent application No. [Attorney Docket No. P02408US], which Is herewith incorporated 
by reference. 

[0079] The group probability may be determined as follows: 



where is the probability that a person j of a demographic group / likes the program on channel C at time t; N is 
the numb^f of persons in the demographic group /, who have the highest probability of watching channel C at the time t. 
[0080] Then, the channel picked up is the one with highest 

[0081 ] The preference determination engine architecture illustrated in Fig. 1 0 includes a plurality of preference sens- 
ing filters 10 (PSF), a behavioral model database 11 (BMdbase), a voting generation layer, output voting weights, and 
a reinforced teaching mechanism. Each preference sensor filters user behavioral patterns into an analog value pro- 
portiona! to the degree the targeted behavior occurs. The PSF and pHMM receive user click-stream data from a click 
stream sensor 12. and dynamically maintain the BMdbase 11. The present BM embodiment includes the following 
novel PSF as defined in the following items: 

1 . Time_watched/Time_available, for non -surfing, and non-hopping programs 

2. Tim e_missed'Time_a\/ail able, for all non-surfing programs 

3. Time_ieftJ Time_availabie, for all non-surfing programs 
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4. Time_watched' Time_availab!e, for programs with hopping 
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5. Time_between_hops. lor programs with hopping 



5. Number of hops per program 



7. T.me independent visitation bias all EPG eniries 

6. Time in program (TIP) visitation bias for all EPG entries 

9. Time in session (TIS) visitation bias for all EPG eniries 

10. Time of day (TOD) visitation bias tor all EPG entries 

11 . Day of week (DOW) visitation bias for all EPG eniries. 

While Ihe terms are self-explanatory, we provide a definition of terms as follows: 

[0082] In item 1 , the term time_watched (T^) is the total watch time, possibly not contiguous, for a particular program. 
Time^available (T^) Is the program length. The ratio Time_watchedlTime_availabte (TJT^) indicates how much of a 
program the user viewed, and statislically reflects their Interest in the contenl. 

[0083] In Item 2, the term Time missed {T^) is the amount of a time a user comes late to a program (negative if 
early). The ratio Ttme_missed/Time_available iJ^fT^) reflects the users eagerness, and possible planning, to see the 
start of the program, hence a greater program preference than if the user otten starts late. If negative, It Is an especially 
strong indicator that the user spent more effort in planning, and hence has a greater preference, to see the program's 
start. 

[0084] In item 3, the term Timejeft (T,).is the time of the program's end minus the time the user leaves the program 
(negative if stayed after end). Timejeft/Time_avaiiable (T/T^) is a ratio to gauge a user^s interest in not missing the 
program's end, hence a greater preference for the program than if the user often leaves early. 

[0085] Taken together these measures detemnine the quantity and quality of possible time spent watching a program 
As shown In Figure 10, most people tend to tightly group in each of these metric. The result is a good correlation with 
program preference. Exactly similar conclusion and measures as T^a. TnAa. ancl J/T^ applies for advertisements 
as well as programs. 

[0086] Preference metrics related to program hopping, items 4 through 6, estimate program preference In relation 
to the hopping behavior. A program hop is the act of leaving and returning to the same program. A program surf \s the 
act of going to, and leaving from a program within a certain short period of time, e.g., 5 minutes. When a user returns 
to a program, that is a strong indicator that there is something about the program worth returning to, or liked. Figure 
11a graphs the T^/Ta for programs with hopping. It will be understood that programs with hopping are watched longer 
than those In the non-hop case. A hop indication thus is an indication of greater preference. 

[0087] In Item 6, the term Time_between_hops designates the time (in minutes) the user was away from the original 
program before returning. As illustrated, most users had hops that were less than 2% of the program. Beyond the 
intuitive appeal of this metric, this data combined with the above preference bias for programs with hopping, demon- 
strate at tendency that the shorter the time away, the more the user prefers the original program. 
[0088] With a similar intuitive and observational appeal, the number of hops per program of item 6 is an inversely 
proportional proxy for user program preference. That is, the more often a user hops back and forth in a program, the 
less they like the program. Figure 11c indicates that most people hop only once or twice in a program with any hopping. 
It is important to note, that the higher frequencies of hopping are increasingly rare, and thus not preferred since most 
hopped programs have a relatively high T^^ ratio. 

[0089] Yet another novel class of program preference metrics are based on a unique method to determine user 
behavioral bias, items 7 through 11 . User bias, as used herein, Is the prejudicial focus of behaviors lo select a signif- 
icantly limited subset of possible choices. With this concept as the motivation, bias takes the theoretical form of a ratio 
of the expected uniformly random selection spread verses the observed behavioral selection spread. User behavioral 
bias is a psychometric tool that measures the psychological bias of a user to choose a target behavior over other 
options in its behavioral domain. A mathematical treatment of the bias metric subsequently follows a summary of its 
application in user program preferences. 

[0090] The behavioral bias metric can detenmine if a selection in question has enough evidence to infer that user 
selection is a preferred action. More specifically, it indicates the likelihood of a non-uniformly random selection process. 
For example, if the selection of a certain channel occurs with the same likelihood as random, then there is no evidence 
of a user selection bias, and the channel is assumed as not preferred. In the converse situation of the selection being 
several times more likely than random, then the channel is deemed preferred. 
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[0091] Similariy. the preference vote of item 7. reiuTis the overaii b:as tc visit any pa'licjla' EPG entry. Items 8 lo 
H are temporally restricted bias queries. 

[0092] iierr. 3 returns a vote on the bias lo make a certain selection at a specified time after the start o'' a TV viewing 
session. A session is the TV usage period that starts at the turn on. and ends at the turn off of the TV. A visitation bias 

5 is the content, or category, visitation frequency over random. 

[0093] Similarly, item 10 restricts ihe bias query lo the :ime of day. and item 11 is a day of the week selection pref- 
erence query vote. The prescribed bias metrics are superior to the prior art in that the latter calculate absolute frequen- 
cies of something happening and not the relative likelihood of a panicjiar obse-A/ed event frequency occurring. In any 
short pe-iod of time a uniformly random process can appear to prefer some possibilities over ethers anc trick a frequency 

to based preference determining system to infer a user preference. However, in such a circums:ance, the present bias 
detection scheme would indicate there are insufficient samples or they are not concentrated enough to infer a user 
preference. This has the effect of lowering the confidence in the poor quality, visitation frequency based metrics, and 
lowers their contribution to the final program preference evaluation; thus increasing accuracy by rejecting statistically 
erroneous sample data. 

T5 [0094] In addition to EPG entry related preference, several behavior related preferences are contemplated, such as 
those described herein. Behavioral preferences provide a mechanism to make program preference predictions in the 
context of a user's past patterns of action. It is often the case that a viewer's mood, or contextual circumstances including 
temporal queues, can influence preferences in a way that has a program liked in one context, and not preferred in 
another. To the extent the BCE models the behavioral context of interest, a more accurate programming preference 

20 prediction is possible. 

[0095] The behavioral model database BMdbase 11 of Fig. 10 is serviced by a standardized behavior model query 
engine. The corresponding target query of the behavioral model (BM) data will now be described in the following: 
[0096] All modeled behaviors and temporal relationships in the BCE, as described herein, serve as the BMdbase 
for system modules to query with the viewer's real-time usage pattern in a TV session, and not simply make the esti- 
25 mation using the users average preference for a program. The BMdbase is a behavioral preference query sen/er to 
any system module requiring certain behavioral likelihoods to make a more optimal decision. Modules that query the 
BMdbase include the preference determination agent (PDE) 13 , the TASAgent, and the PresAgent. The behavioral 
model query engine (BMQengine) sery\cBS ail search queries to the BMdbase. 

[0097] Any environment state variable (such as TV volume), or EPG entry (e.g., channels), or their derivatives, is 
30 potentially a hyperplane in dimension 1 , below, of the behavioral model. The preferred behavioral transition model has 
five dimensions (Dim) as follows: 



• Dim 1 - [likedTitle, likedChannels, unlikedChannels, surfChannels, likedGenres, unllkedGenres, surfGenres] 

• Dim 2 - [from State code, i.e., channel, or genre, or Title ID number] 
35 • Dim 3 - [to State code, i.e., channel, or genre, or Title ID number] 

• Dim 4 - [nonTemporal, DOW, TOD, TIS. TIP] 

• Dim 5 - [temporal fuzzy bin] 



[0098] Dim 1 selects the type of state variable. 
40 [0099] Dim 2 sets a constraint for the from' state of interest with the reference ID. A 'from' state is the state the viewer 
leaves when making a state transition. 

[0100] Dim 3 sets the to* state ID for the query. The to' state is defined exactly as the 'from' state, except it is the 
state a viewer goes to upon a state transition. The reference IDs could be channel call letters, such as 'ABC, genre 
names such as 'movie', or title hash codes. 
^5 [0101] Dim 4, the type of temporal relationship; and. 

[0102] Dim 5 the corresponding time interval; e.g., valid selections for DOW are: Mon., Tue., Wed., Thurs., Fri. Sat. 
and Sun. 

[0103] There are approximately 14 to 16 fuzzy membership categories that provide a dependable fuzzy model. The 

number of members within the categories may be set to vary from about seven for relatively unimportant membership 
50 categories {time_teft/time_watched ratio), to about 17 for the most important categories (e.g., time watched). 

[0104] In addition to BM search constraints, there are functional specifications (specs). Function specs Include, but 

are not limited to, the following: 

QueryFunction - [mostLikeiy. leastLikely Top_n. Bot time_£um] 

[0105] The 'most' (least) likely search function returns the highest (lowest) probability states and bias values that 
55 satisfy the rest of the query constraints. 'Top_n' (Bot_n) returns number 'n' the highest (,owest) probability states and 

bias values that satisfy the rest of the query constraints. The 'time_sum' function aggregates all the bias terms (by 

averaging, or counting, etc.) in each specif iec TimeType's TimeValue intervals. Thus, a general transition query format 

is: 
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TQueryFunction] [StateType] [frorr.StaielD] [loStatelDj [nrr.eTypel [Tirr.eValue]' 
r01 061 3y way of example, the following quer^' searches for the top 5 liked genres on Sunday: ^ ^ 

-Q jeryFunction = Top^n=5. StateType = LikedGenres. frorr.SiaielD = null. toSiatelD = null, i ime lype = COW. 

TimeValue = Sunday' 
5 [0107] A typical query result, where 10G is the maximum preference, is 

[action = 60 news = 40 comedy = 30. null, nulil if only three genres were liked on Sundays. 
[0108] A typical use of 'time^sum' is to search for the most likely time of activity for a given StateType. For example: 
Find the top 3 most likely times of day a user watches TV. The query is _ 

'QueryFunction = time^sum. StateType = LikedChannets. fromStatelD = null. toStatelD = nulL i imeType = TOD, 

mToT'Tn^^'^case. the 7/me_sum' function will aggregate all transition biases per TOD interval, and return a list of 
results, if the person is most active in the mornings, evenings, and late night then a typical query response could be 
{'•' implies a new row. and '/£ new column): - ,x 

[(late_nightvery_often): (wee_hours. never); (early^moming; never); (morning, mostly); (late.mommg; rarely); 
15 (after noon- rarely): (late^after_noon; sometimes); (evening; almost.always); (night, typically)] 

[Oliol A module with a defuzification table searches the returned matrix tor the top three likelihoods, namely, 

[(morning, always): (late_night.very_otten); (evening; almost_always)] . 
roini A multitude of standardized query Interfacing are readily practical to interface with the BMQengine. For ex- 
ample a SQL interface would specify the dimensional attributes as 'SELECT...FROM...WHERE- clauses; e.g., If the 
20 most likely, or popular, TIME for watching MOVIE.ACTION is In the evening, the SQL query is: 
(SELECT view_start_time FROM preferences 

WHERE genre_main = 'movie' 

AND genre_sub = 'action' 

AND view_day_of_week = ( SELECT view_day_of_week FROM preferences 
25 WHERE genre_main = 'movie' 

AND genre^sub = 'action' 
GROUP BY view_day_of_week 
HAVING MAX(BIASview_day_of_week));) 
GROUP BY view_start_tlme 
30 HAVING MAX(BIAS_view_start_time)); 

[01 12]"^ A basic SQL interpreter converts SQL search parameters into BMQengine dimensional attribute constraints. 

°' ^ QueryFunction = mostLikely, StateType ^ LikedGenre, fromStatelD= null toStatelD= movle:action , TimeType = 

35 TOD. TimeValue = evening. . . r«o«w 

[0113] Although a wide variety of modeled behaviors, and query architectures are contemplated, there are still many 
others. The following is a general enumeration of some behavioral preference categories. Here, the outputs depend 
on the real-time viewing context: 

40 1 . Time sensitive transition preferences for all EPG entries 

2. State-sequencing 

3. Transition reversal bias 

4. Time watched per session, and per all EPG entries 

5. TV control patterns of behavior 

45 6. T^Ta, T^rr^, and T/T^ for all EPG entries 

7. EPG entry and behavioral diversity focus (breadth, depth search control) 

8. Most likely starting, or ending, state 

[01 14] I n item 1 the BM produces time sensitive, and time independent transition likelihoods for any EPG entry. Every 
50 EPG entry class is further segmented into a plurality of behavioral categories, including surf/non-surf, hopping, and 
likedyunliked states, as previously defined. Each user action creates transition statistics in each of these domains 

according to the SSM algorithm. 

[0115] An example is a query to the BMQengine for the likelihood that a particular actor (or any EPG field entry) is 
watched after watching the news (or any other EPG field ent.^'), with no time constraints. A typical time sensitive 

55 preference query would look like: 

'what is the likelihood of watching sports on Monday (or any day) 

AND 

in the evening (or any time), 
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AND 

after waiching TV for ar hour ^or any anojnt of TV watered t.me;. 
AND 

while ha fway into the current program [or any amount of program wa'.ched :ime). 
[0116] Irrpo'lantly the query can be relaxed with less conjunctive terms, or tightened with nore constraints. Using 
th;s mechanism, a system module can pass the real-time contextual information to the BMQergine. and discover that 

althojgh the use^ loves spors in general, he/she does not like to watch it on Monday evenings within an hojr of 
watching TV midway into another program, "his highly soecific case is a demonstration of the high detail of learning 
and con:ex'ua! preference possible in the present system. The system, module send the full range of query abstraction 
levels, i.e., from very general (e.g. find liked genres), to a fully conjunctive EPG, temporal, and state sequencing 
likelihood search. The systemi module finos the most likely leve' of query abstraction, and rates programming by their 
distance from its query paramieters. 

[01 17] For example, if a sports program is to oe rated, the first, and most abstract query, might be 'is sports liked', 
then subsequent queries will increasingly add constraining terms depending on the real-time context, like 'is sports 
liked on Monday evenings after watching the news', and an even more specific query might add 'on channel 2' to the 

latter query, and so on. If the last, and more specific, query was most likely, then a sports programs on Monday evening, 
after watching channel 2 news would rate higher than the same sports program if these behavioral constraints were 
not met. In the PDE query case, the closer (farther) a program is to the most likely behavioral constraint, the higher 
(lower) the behavioral voting contribution is to the PDE rating. 

[0118] In yet another aspect of Item 1,the BMQengine supports a query for the most likely transition given a modeled 
context parameters (I.e., EPG entries, timings, behaviors). The BMQengine responds with all probabilities that match 
the query terms, assuming unconstrained model dimensions are a wildcard. Thus, the BMQengine recursively applies 
the constrained dimensions across all unconstrained behavioral dimensions. For demonstration sake, we assume the 
BM models day-of-week (DOW), and time-of-day (TOD) for liked titles, channels, and genres. A typical query and 
response appears as follows. 

Query: 'what are the most likely genre transition from genre = weather, DOW = Monday, and TOD = evening'. 

Exemplary response: 'genre = comedy, bias = medium'. 
[011 9] If any term is not specified, then the BMQengine searches all entries of the unconstrained dimension for query 
matches. For example, if DOW was omitted from the last query, then each evening would be searched for the most 
likely transition genre from 'weather". A possible query response is '[genre = drama, DOW = Tuesday, bias = high]', if 
the most likely transition from genre = weather is to a drama is on Tuesday. If the control term of the last query 'most 
likely' was changed to 'all', then every destination genre during any evening with a transition from weather would be 
returned. 

[0120] The same mechanism provides for finding the typical amount of time a user tends to hop away from a liked 
program. This behavior could arise from skipping commercials, or time sharing with consistently competing content. 
An example of a query to find the top 2 most likely hopping times for the program named "Seinfeld" appears as follows: 
Query: [QueryFunction = top_n=2, StateType = Title, fromStatelD= null, toStatelD= 'Seinfeld', TimeType =TIP, 
TimeValue = NULL]. If the person usually only skips Ad breaks at 2 minutes, and 15 minutes after the start of Seinfeld, 
in 30 and 50 % respectively of all transitions to the show, then the fuzzy time bin, query response would be 
[(QUICKLY_SURFING,30) ; (VERY_SHORT50)]. 

[01 21 ] Using transition context Information enables the PDE to assign better program preference ratings, and pemrilts 
the PresAgentto order programs on a virtual channel in a user preferred program order and time. 
[0122] A state-sequencing query, Item 2, addresses the likelihood that selected EPG entries are part of a pretended 
state sequence, and returns the probabilities and states observed. A state-sequence is defined as any contiguous set 
of state transitions greater than one. Each modeled state transition matrix has a companion state sequence table. 
Instead of storing the actual permeation of observed state sequences, state sequence table logs the states visited and 
their frequencies. Since human behavior rarely repeals with exacting precision, it is more important to parameterize 
transition sequences to match behavioral tendencies. Thus, there are two ordered tables. One two-dimensional table 
has its rows as the unique combination of visited states in an observed sequence, and the columns are the IDs of the 
visited states. The rows are sorted by column vector length, and the columns are alpha-numerically sorted. This sorting 
speeds searching the table for a given query pattern. A corresponding table, effectively a third dimension, accumulates 
the number of times a state was visited in the sequence. Each tine the same combination of states arc visited in a 
transition sequence, the per state visitation frequencies are added to this count. An. effective, fourth dimension accu- 
mulates the sequence lengths of sequences of matching visited states. A fifth, and final, sequence modeiing dimension 
ccun:s the number of times a particular state sequence combination row occurs. This dimension is jsed to calcjlaie 
the average sequence length, and average number of times a state in a sequence was revisited. A state visitation 
sequence match occurs when the siates visited in a new sequence exactly matches a states visited row entry. When 
a match occurs, the state visitation frequencies, and sequence length, are added to their respective dimensional ac- 
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cumL'iators. Oiherwise. a new row entry is appropriately created. 

[0123] Rg. IS illustrates a tvoical example. The example is for char.nel states, but the algorithm applies to any stale 
seauence modeling. Fig*. 13A shows an originally empty database after processing two sample state sequences. The 
two cequences have at least one different visited state, resulting in two new row entries. In Fig. 133. iwo aoditional 
sequence examples are processed. Sequence three, although not the same as sequence one. is aggregated into the 
same row entry as sequence one. since exactly ihe same states were visited. Sequence four differs by one state, thus 
a new dbase entry is created. Sorting columns and rows continually, or periodically, makes query searches more effi- 
cient. 

[0124] A typical sequence (Sequ) likelihood query format appears as follows: 

[QueryFunction] [StateType] [sequStatelDs] [LengthValue] 
[0125] The 'QueryFunction' term has the same parameters as the transition query case. Additional 'StateType at- 
tributes identifies the type of state sequence to select, e.g.: ChannelSequ, GenreSequ. TitleSequ. SurtGenreSequ, 
SurfChannelSequ or any EPG entry such as ActorSequ, etc. The query constraint terms 'FromStatelDs' and 'to- 
StatelDs* are similar to the prior transition query case, except they each are a list of either already visited, or yet to be 
visited states, respectively. The query term 'LengthValue' constrains the average sequence length to search for. Some 
typical sequence related BMQengine query examples follow. 

Example 1 : 

[0126] Find the top 5 most likely channels that completes non-surfing viewing sequence, given the previously visited 

channels 30 and 40, the query constraints are 

QueryFunction = top_n-5, StateType = ChannelSequ, sequStatelDs = [30.40], LengthValue - null. 
[0127] If channels 43. 58, and 60 were the most likely to complete the sequence, and their probabilities are 80, 1 0, 

20 and percent respectively the query result would be: 
[(43.80), (58,10), (60,20), null, null]. 

Example 2: 

[0128] Find the top 5 most likely channels that complete a surfing sequence three channels in length, given the 
previously visited channels 30 and 40, the query constraints are 

QueryFunction = top_n=5, StateType = SurfChannelSequ. sequStatelDs = [30.40], LengthValue = 3. 
[01 29] If only channel 43 was the most likely to complete a surfing sequence three channels in length, and its prob- 
ability is 80, the query result would be 

[(43,80), null, null, null, null]. 

Example 3: 

[0130] Out of all sequences of at least 4 programs, find the probability of watching the following three programs 
sequentially- Friends. Frasler, and Seinfeld (assuming they were chronologically concurrent or consecutive). 

QueryFunction = mostLikely. StateType = Tit leSequ, fromStatelDs = ["Friends". "Seinfeld", "Frasier], Length Val- 

□e = 4 

[0131]' A typical query result if these query constraints were observed in 20% of all sequences with LengthValue at 

least equal to 4, Is [20]. . »• »i. 

[01321 In each of these examples, the channel, or sequence probabilities are calculated by simply counting the 
number of times a query satisfying state was visited, out of the total number of similarly constrained records. A more 
complete model of behavioral sequences additionally Includes single transition information to statistically infer the most 
likely permutation. The novel sequence modeling method set forth employs the ■siales-visiled' model information, 
above to infer obsen/ed combinations, and BM state transition infonnation to estimate the most likely pennutations, 
i e sequence order. The advantage of this sequence modeling method, is to significantly filter noisy behavioral se- 
quencing data and save memory, while preserving characteristic state sequencing information. It is well know that the 
memory requirement of storing all possible pemiutatlons of numerical sequence grows with the factonal of sequence 
length In the TV environment this is often a prohibitive, and wasteful use of limited resources. A pessimistic estimated 
probability of a particular sequence order occurring is approximatefy equal to the product of the probability of matching 
sequence parameters (i.e.. length, stales, and states visitation frequencies), and the probability of each transition 
occurring Since each sequence transition is not independent of the prior one, this estimate is clearly a lower bound. 
A variety of heuristics are contemplated for comparing the relative likelihood of one pennutation over another by con- 
sidering the directional bias of each transition. In short, a permutation is more likely if its transition directions have a 
significant bias over the reverse direction. A simple heuristic to calculate an ordering likelihood metnc is to sum the 
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difference of the fon.vp.rd minus tne reverse cirection transition biss fcr e^ch sequence step. 

[0133] With this inforna:ion. a system module can estimate the expected Ikeihcod that £ state is in a specified 
sequence, -or example, if a TV user starts viewing CNN. anc switches to PCX. a q jery to the BMQengine could return 
the mos: likely channels to come, ara ihere likelihoods. As shown in more detail herein, this information could oe used 
oy the FDE to bias the preference of a prog-am. or a program's sequential placement oy the PresAgent in a v.rtual 
channel 

[0134] Transition reversal bias, item 3. seeks to detect any directional bias to a user's state transition behavior. An 
example is If a viewer tends to move from CNN to PBS, but rarely from PBS to CNN. Similarly, for genres, a user may 
prefer to more often watch news before long drama movies. Many such preference may arise in Titles, or any EP3 
entry. System modu.es sensitive to program sequencing can use reversal bias to predict the viewer's preferred pro- 
gramming order. The PDE uses the directional bias to influence a program's preference rating to the transilon context 
of a viewer's recent history. For example, assume a user just watched news, and the PDE must calculate the most 
preferred pi-ograms in the EPG to suggest for viewing. In the case where the PDE othenwise rates programs on CNN 
and PBS equally, it would rate the PBS program high if there was a significant transition bias from CNN to PBS over 
the reverse case. MemberAgent uses this as a behavioral parameter to identify classes of viewers. State directional 
bias assists the PresAgent to better sequence the program guide for virtual channels. 

[01 35] Information such as typical time watched, item 4, per EPG entry ana TV session, helps system mooules better 
match a viewer's attention span for specific types of content. If a viewer tends to have short TV viewing sessions, then 
shorter programs get higher ratings than longer ones. If a user tends to watch action movies for a much shorter time 
than comedies, then programs in the respective categories are incrementally preferred accordingly. In principle, atten- 
tion span applies to all EPG entries, and most principally to channel, genre, title, and actors. Attention span potentially 
separates viewers (for the MemberAgent), and directly effects their preferred mix of content viewing times (for the PDE 
and PresAgent). 

[01 36] Monitoring TV control patterns, listed in item 5, is a significant tool in identifying users behaviorally, and often 
motivates program preference conclusions. A typical example is modeling a users control behavior of the mute and 
volume buttons. The PDE uses the mute button as an indicator of less preferred programming. In the context of Ad 
watching behavior, the MemberAgent uses the mute button to learn types of Ads a user may not like. Other TV control 
buttons, such as volume control, offers similar predictive potential. If a users raises the volume of a certain program, 
then they are more likely to like that program. With respect to user identification, teenagers may be more likely to 
significantly raise the volume of music videos than mid-aged adults. Similar, modeling and query mechanisms as in 
temporal modeling applies, except time interval plains are substituted for the appropriate control parameter intervals. 
[0137] The same T^/Ta, Tn,/Ta, and T/T^ in program preference ratings, similarly apply to behavioral preference 
ratings, as in item 6. Through a user's viewing history, any EPG entry will have an inferred preference associated the 
user's program viewing behavior. For example, if a viewer is often late in watching programs with a certain actor, then 
the state corresponding to this actor would have a high T^n/Tg ratio. Similarly, for every combination of PSF and EPG 
entry SSM. 

[0138] As a further metric, item 7 brings forward the importance of curiosity, and diversity psychometric behavioral 
parameters. Diversity measures seeks to characterize a user's spatial coverage in each domain of interest. Any mod- 
eled state domain receives a focus rating calculated by dividing the selections visited by the total selections available, 
during a certain period of time; e.g., channeLdiversity = 

number_of_channels_vislted/total_number_of„channels. Different behavioral state classes, or EPG entries, have their 
own diversity measures; e.g., genre diversity etc. Different people tend to have a wide range of domain diversity meas- 
ures that characterize them. Older people may have fewer channels they watch than teenagers, for example. Similar 
to diversity measures, focus measures apply to cross dimensional, often hybrid, domains. These hybrids are often 
behavioral derivatives of EPG entries, and not direct measurements of EPG selection spreads. For example, Channel 
Sequence and surfing focus, measure how few unique states make up all observed sequences or surfing. Another 
psychometric class are curiosity measures. CL/r/os/ty measures estimate a users psychological tendency to explore a 
particular category of content. For example^ channeLcuriosity \s the ratio of the number of liked channels, out of all 
channels non-surfed. The lower this ratio, the more the user explores channels that they have not previously liked. A 
person who is not very curious, would tend to stick only to things that they have iiked in the past, and would have a 
very low curiosity rating. 

[0139] The preference determination engine PDE uses diversity, focus, and curiosity metrics to determine the ap- 
propriate mix of highly rated programs to suggest for viewing. Taken together these measures control the breadth, 
and depth of programming predictions presented to the viewer. For example, if a viewer has a low genre diversity 
rating, the PDE would concentrate top program suggestions within the fewest number of genre categories, and con- 
versely if the rating was high People tend to have a predictable range of diversity and curiosity ratios. As used by 
MemberAgent the combination of these measures tends to separate individuals and the classes they belong to. With 
respect to the PDE. if. for example, their channel curiosity rating is high, the PDE gives suggestion preference to 
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channels not previously watched often. The PrssAgent. having a similar task as the PDE. creates virtual program 
viewing guide that tends to match the daily variation, and novelty that a user prefers. 

[01 40] The most likely starting, or ending, state of item 8 is an imponant parameter for all modules. The PDE uses 
starting or ending state likelihoods to bias program ratings to according user history v^hen turning on and off the TV. 

5 These parameters, also, heb identify users for the TASAgeni. The PDE couples ending state inforrr.ation with session 
duration predictions to bias programs towards ending state preferences as the actual session time approaches and 
surpass the expected TV session end time. The PresAgent applies a similar principle when constructing a time appro- 
priate virtual programming guide. For example, if at a certain time and programming state, a user often end their session 
on a certain channel, then any programming on that channel will get a preference rating bias. In general, such rating 

10 biases are relatively smalL and are meant to give preference to programs that are otherwise closely rated, but have 
some contextual bias, such as sequencing. 

[0141] It will be understood that the above are exemplary embodiments and implementations of the invention only. 
A wide variety of permutations and variations on the preference metrics are well within the scope of this invention. 
[0142] The following description provides details concerning the behavioral model processing according to the in- 

75 vention. . . 

[0143] With reference to Figs. 2 and 3. each users action, or selected non-actions, creates parallel SSM transition 
events in each of three SSM state spaces: Channel. Genre, and Title. These state space categories each have tem- 
porally dependent and independent dimensions. The first dimension, from top to bottom, is time independent, and 
notes any stale transitions whenever they occur. The second SSf^ dimension tracks the time_in_program, namely, 

20 how long since the last transition, a state transition occurs at. Dimension three models transitions relative to the time 
since the start of a TV session. The fourth SSM dimension detects time of day patterns of behavior, and the fifth day 
of the week. The goal is to detect periodic sequential events that have some degree of relative temporal or sequential 
bias. Each state space dimension has a transition matrix acting as fuzzy bins to quantify the temporal membership to 
which a state transition sequence belongs. 

25 [01 44] For example, the Time of Day dimension forthe channel, genre, and title SS Ms, consists of mutually exclusive 
transition matrices that enter each state transition in one of the following time matrices: 

Late_Night WeG__Hours, Early_Moming, Morning, Late.Morning, After_,Noon, Late_After_Noon, Evening. Night 
[0145] Fig. 4 graphs an instance of the preferred state space, and the legal transition flows within each statistical 
state machine (SSM). Inside any of the probability density functions there exists only one state at a time. The SSMs 

30 Of the preferred embodiment are liked titles, channels, and genres; and unliked channels, and genres; and surf chan- 
nels, and genres. Each SSM contains fixed and variable states. Fixed states describe transitions between SSMs. 
[0146] The first fixed state for all SSMs is the OFF state. The OFF state occurs when the TV is off. 
[0147] The SSMs modeling liked state transitions have as the second fixed state the UNLIKED state 
[0148] Conversely, the SSMs modeling unliked state transitions have as the second fixed the LIKED state. Channels 

35 and genres viewed for less than a certain threshold amount of time, e.g. 5 minutes, count as surfing transitions, 

[0149] The non>surfing SSMs have a third fixed state called SURFING. The state SURFING is active when the user 
views a program for less than the surfing threshold. The channel and genre surfing SSMs have only the OFF and 
NOT.SURFING fixed states. Variable states for state spaces are ideally al! states possible to visit. However, practical 
resource constraints often significantly limit the number of states that can be fully modeled. One method to compress 

40 the BM without significant errors is to only have the most representative, or preferred, of each state space included 
and enumerated as variable states in the SSMs. 

[01 50] A preference determination engine (PDE) - see Fig. 1 0 - assigns preference ratings to titles, channels, and 
genres The maximum number of states resources permits are taken from the top ratings in each category. States in 
surfing SSM models are a union of liked and unliked SSM states. As with any state machine, the SSM can only be in 
45 one state at a time. For example, when viewing a liked channel and transition to an unliked one. the 'to' state would 
be the UNLIKED fixed state of the Liked Channel SSM, and the 'from' state in the Unliked Channel SSM is the LIKED 

fixed state. ^ 

[0151] Transitions between variable states define the block named probability density function, or pdf . as it models 

the likelihood for any particular behavioral state transition to occur. Variable states are added to and deleted from the 

50 pdf depending on their statistical significance over time. In practice, most TV viewers may individually visit fewer than 
30 of 100 channels and 50 of 100 genres over the course of six months. Using this observation, a systems oesigner 
under limited systems memory constraints can significantly reduce system resource requirements, and yet continue 
to capture the vast majority of a user's behavior. Up to a certain limit, the present invention's perfomiance is proportional 
to thus scalable with, the number of top preferred states in the SSM. The lower limit on the number of required states 

55 tends to come from minimum required performance on individuals that are very similar, but different in very subtle, 
ways. The upper limit is set by the diminishing performance benefit of adding states, verses the penalty of system 
resource constraints. To find an optimal limit., a simulation sweeps model resource parameters over a statistically rep- 
resentative pooulation sample of TV viewing behaviors. Such memory usage optimizations are most critical in the TV 
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Prof Agent In a real-time n^cdel building mode A certain nunber o' Temporary' slates are continualiy necessary' as :he 
pfO'Agent builds enough evidence to determine which to include into the SSM. c reject as not active enough. Lpon 
•caching an available nrenoo' limt. tne agent deletes the least preierrea states as determined by a unique aigonthm 



[0152] When a user makes a content transition, a state transit.on event is registerec as describea in each respective 
SSIVi. An aclion-based transition is any explicit TV contro! button pressed e.g.. a charne- change, or volume ircrease. 
A non-action event occurs when content changes with no explicit user action: e.g.. new program on same channel, in 
the latter case, a new program event causes a sel- transition on any other state that stayed the same, i.e., a steady 
channel is a channel state self transition. Another possible case is no channel charge but the sane liked program 
title repeats itself. In this case, all SSMs will have a self-transition in their last sta:e. importantly, self-transition to 
programs that are short enough to be otherwise counted as surfing, are counted as non-surfing transition. This choice 
follows the philosophy of heuristically modeling the users behavioral intent. 

[0153] Fig. 5 details a representative state space SSM matrix, of Fig. 4, ana its operation. The rows represent the 
'from' state^ and columns the 'to' state of a state transition. The process may be referred to as a dynamic demographic 
cluster knowledge base in terms of transition matrix and weight set (TNWS) 

[0154] The transition matrix, in principle, describes the viewers' behavior in a kind of temporal fomn. The transition 
matrix Illustrated in Fig. 5 is a channel transition matrix. The dimension of the matrix is AxA. A is the number of channels 
available plus 2. 

[0155] The number of different types of matrices is 2, one is for channel - as shown in Fig. 5 - one is for genre. 
[0156] There are two sets of matrices for each day of the week for every demographic group , i.e. there exist 14 
matrices for channel group i (i=1 ,2,3..N; N the number of groups). One set for watching activities, another for surfing. 
[0157] The matrix in Fig. 5 shows the following transitions: On Oh. 5 — > Gh.2 Ch.6 Oh. 100 Off. An item In 
the matrix (A, B, C, D, E) Is the median of all ^^^^ demographic group for the action, e.g. transition 

from channels to channel 2. Items in the on-cclumrfamfoSf-column and those matrix for surfing are counts for the action. 
[01 58] Based on the transition matrix It is possible to predict a demographic group. After building the transition matrix, 
weight sets have to be optimized for all demographic groups. The optimization is based on maximum entropy theory 
and reinforcement learning. 

[0159] On the client side, the prediction is effected the same way unless there is a memory restriction. If that is the 
case, entropy evaluation will be used to eliminate those columns which are less Important. The weighting items dis- 
cussed later are optimized and fixed for the client to use. However, if the client has more computation power than it 
can consume, it can optimize those weight sets and keep them locally. 

[0160] By way of example, we take a watched channel transition matrix HQ. .\NUere i is a channel; j a group, is the 
entropy of a to channel, i.e., a column, In a channel transition matrix. Note that the lower the entropy is, the higher 
value of infomriation content the column has. A lower means that there exists a valuable fo transfers value for this 



in the PDE. 



channel. Ideally, Hq.j, =1,2,3...Nj = 1,2,. ..M, N the number of channels and M the number of groups, are different for 
groups, which means that the transition matrix will help to identify viewers^depio graphic group. 

[0161] The final goal is to obtain a probability Pj for a viewer then picks up i as j the predicted group for the viewer. 

[0162] The probability Pj can be derived from two sets of transition matrix ivith the transition matrix of the viewer as 
follows: 

[0163] First, calculate Hx^^ where X are channel and genre, also for both watching and surfing, i.e. X takes 4 values. 

[0164] Calculate A/^^^ which Is the entropy of from, i.e. the rows. 

[0165] Then, calculate the probability distribution for both columns and rows 
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[0166] Next, create a weight matrix tV^yOf every transiiion matrix fcr ai! groups. The itemiS w^.^ in the matrix are 



w - 



w' = P, (a )*P,. (a ) 



where \a\ ^ is the weighi for one transition action. 

[0167] INlow, calculate a weighted item distance between knowledge base and viewers' transition matrix 



where L is one of the transition matrices. The terms and are L matrices for knowledge base and viewer respec- 
tively. 

[0168] Finally, reinforcement learning should be utilized, e.g. Monle Carlo type, to optimize Wj- for the best result. 
A good result is that the viewer's group is predicted correctly. 



[0169] There are two types of receiver with which the system can operate, one with feedback channel, one without. 
The optional feedback channel Is indicated as a dashed line In Fig. 1 . If a receiver with feedback channel provides 
demographic information, the task of demographic group based advertisement is quite straightforward, as will be de- 
scribedin the following. Also, the perfomiance of the advertisement is easy to measure. If the receiver does not provide 
demographic information of the viewers, it will be treated the same way as a receiver without feedback channel, apart 
from the way of gathering the performance which can be obtained by the feedback channel directly. 
[0170] For receivers without the feedback channel, two methods are provided for determining viewers' demographic 
group. Which one to use depends on closer match principle of the two, which is measured by maximum entropy measure 
on Pj of the above discussion. The method which produces smaller entropy value should be used. The performance 
of those receivers without feedback channel should be measured by market research, then put into the learning circle 
as those with feedback channels. 

[0171] The realtime feedback of viewer's action with demographic and performance info, of course, exists only when 
a back channel exists for a viewer's receiver. It contains information of the viewer's behavior information stream and 
the performance of the demographic prediction. The behavior stream should contain at least the Items listed above, 
namely, watch_date, watch_start_tlme, watch_duration, watch_channel, and demographic information. The perform- 
ance is a temporal list to indicate whether the receiver's prediction is right or not. 

[01 72] Since the demographic infonnation of the receivers are knowm, the knowledge base acquirer (see central box 
in Fig. 1) can make changes based on the feedback inforrnation. 

[0173] In sunrimary, therefore, Fig. 5 is a representative non-temporal state space SSM matrix, and its operation. 
The exemplary channel state space uses the transition from OFF to channels 5, 2, 6, and 100, and back to the OFF 
state starting from an empty matrix. The ProfAgent increments the appropriate transition matrix entry for each state 
pair. For the present example, the following (from, to) matrix entries would be incremented (5, on), (5,2), (2,6), (6,1 00), 
and (1 00,off). The number of times visiting channel 5. is the sum of all of the entries in column 5 (, the global probability 
is that number divided by the matrix totals. Once on channel 5. the probability to go to channel 2 is the counts of (5.3) 
divided by the sum of row 5 (channel 5 from states). 

[0174] A similar process governs the accumulation and analysis of higher dimensional transition matrices such as 
temporal. The mechanism is identical, except the entry in the appropriate time interval plane of the transition is incre- 
mented. An important difference between the pseudo HMM implementation, and the theoretical HMM is that the state 

transitions in the Markov chain are not necessarily independent of the last state. Thus, the probability to traverse 
through a state sequence is not necessarily equal to the product of the individual state transition probabilities. It is a 
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iower bound, nowever. To the extent the Ljser 5ta:e seiectiDn behavior is uniformly random ihe pHMM converges 
towards the theoretical HMM. The lack of transition independence does not extinguish the ut Itty of a partial random 
staie nr.achine model as user seleclion behavior tends to be Gaussian, however, it coes require additional statistical 
infomiation to compensate. Sequence mcdeling. as describea nerein. is such an attempt. 

[01 75] The preference deieirii nation and the profile modeling aescribed in the foregoing may be applied in a variety 
of contexts. Here, we concentrate on the targeting of advertising content based on the preference ratings and prof! e 
modeling. 

[0176] Fig. 6 illustrates the advertising category, cluste- learning architecture that is applied in the targeting server. 
The BCE creates rn-clusters from m users from a particular advertising category training set. The Cluster Aggregator 
block extracts the most representative aspects of the learned custers and creates a typical profile of the group. After 
training over n Ad categories, n typical advertising category behavioral profiles are created. That is, Fig. 6 depicts an 
advertising category, behavioral prototype learning system, i.e.. the top level advertising category cluster learning 
architecture. The module resides at the head-end Ad Manager inside the targeting server Cluster learning is a continual 
process of defining and optimizing advertising category groups (clusters), and their correlated behavioral profiles based 
on high quality tagged and sampled TV user logged data. The demographic, behavioral data are either input from a 
third party, and/or field deployed units. A selection filter extracts the targeted advertising category at the training set 
for the BCE. The BCE processes each user record in the training set as if they were from the same user thus creating 
a very large aggregate BM. The resulting BM Is parameterized, and pruned to a subset of only highly biased dimensions 
that serve as the representative behavioral signature for the advertising category. This step is referred to as Inter- 
prototype pruning, since it only removes bad, insufficiently biased, dimensions within a given BM. The BCE repeats 
this procedure for each training cluster, until every advertising category group has a corresponding behavioral signature 
profile, if one exists. A typical advertising category profile will exist if, and only if, there is at least one behavioral 
dimension significantly biased over random. Each SSM has a corresponding set of novel parameterizations of generic, 
characteristic state transition behaviors that tend to separate users. 

[0177] Fig. 7 depicts the pruning phase of advertising category template buiiding. This phase distances the prototypes 
by removing the dimensions most in common dimensions among the categories. The second stage, herein called intra- 
prototype pruning or Intra-profile pruning, of advertising category behavioral prototype building removes dimensions 
in each BM that are similar to all other corresponding BM dimensions. This step selects the most distinctive dimensions 
across all targeting reference profiles, hence creating a minimal description length for each advertising category pro- 
totype. If the result of this pruning process is to remove all, or significantly all, of a prototype's classification dimensions, 
then the most similar Ad categories are merged into a single predictive class, thus diminishing the best targeting res- 
olution of the system to the merged Ad categories. For example, If insufficient dimensional distance separates three 
male age groups in their 20's, 30' and 40's, then these Ad classes are merged into a single class of males between 
20 and 40 years old. The targeting server sends the final targeting Ad profile prototypes, their category labels, and 
expected prediction perfomiance to the Ad Server 

[0178] The detailed aspects of the novel training, pruning, and merging process follow. The BCE begins the training 
process by building the BM with labeled user data from the selected targeting group members. The BCE calculates 
transition and certain behavioral patterns from each user action. Transition counting events supported by the BM include 
changes in any program EPG entries, occurring at various relative time measures. A minimal event training log appears 
as: 

<user ID> <date> <time > <channel> <genre> <title> <program length> 
[0179] Since the training set is a preexisting database, learning occurs in batch mode, instead of real-time, tn batch 
mode, all viewed programs within a certain learning window, are rated and sorted at once. The BM simulation steps 
the learning window over the user data, or teaming period. With real-time data, a temporary holding area Is necessary 
while buiiding sufficient evidence to Include the transition into a statistical state machine SSM. The learning window 
determines the temporal performance of the system. On the server side, there is generally no memory limitation, thus 
no need to delete possibly less preferred states to include new observations. The .main effect of the learning window 
is to estimate the client-side BM performance over a range of memory limited user history' pe.nods. A goal of the present 
invention is to identify stationary behavioral parameters, and their typical variance. When 

the probability density function for the random variable is a function of lime, the random process is said to be non- 
stationary. To determine the statistical stability period for the group non-stationary random process behavior, the learn- 
ing window is continuously adjusted from a few days, to a few months. The approximate stationary period is the learning 
window size that has the best congelation between window steps. Every advertising category potentially has a different 
average stationary period. For example, senior citizens may have more repeatabte behavior for a wider interval of 
time than teenagers. The targeting server provides the Ad server with the expecteo learning times needed to approach 
prediction stability, and convergence for each advertising category. 

[0180] Fig S diagrams the real-time advertising category estimatior system. The MemberAgen: compares the real- 
time TV user's usage behavior to the advertising category templates and calculates a probability distribution of the 
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user's advertising category'. 

[0181] Fig. 9 outlines the Py Ad targeting sysiem according to the preferred embodiment. The lASAgent receives 
Ads from the TV head-end, and interprets the Ads targeting metadata. Tie TASAgent compares the target audience, 
specified by the Ad's targeting query expression, against items selected from the household users advertising category 
5 predictions data, and produces a targeting rating that the TASAgent. and PresAgent use to determine which Ads should 
be stored, and displayed respectively. 

[0182] Referring now to Fig. 10, there is illustrated the preference determination engine archiiecture according to 
the invention. The profiling agent (Prof Agent) incrementally updates the behavioral model BM with each content change 
event. Initially, the event is decomposed into its states, if any, and temporal relationships. Liked states are any modeled 

w aspects, characteristics or usage, associated with a liked program. The ProfAgent receives program preference ratings 
from the preference determination engine (PDE).The PDE determines a liked program by evaluating the voting network 
in Figure 10. There are three main components to the PDE, real-time content and context preference learning (Prof- 
Agent), preference prediction (PrediclAgent), and a BMQagent. A description of the PredictAgent follows the ProfAgent 
overview. The output Is a perception-like weighted linear fuzzy voting, combination of the previously enumerated pref- 

15 erence sensing filters (PSF): 

The ProfAgent Learning Algorithm 
» Initialization 
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1 . Set all n weights equal to 1/n 

2. Set p, 5, and ti to conservative estimates (e.g., Pp = Pn = Or'Hp = . "Hn = ^ ) 

Where p is a trial dependent learning momentum tenn. t\ is a trial independent learning rate. The n 
and p subscripts correspond to a negative and positive event, respectively. 



• Calculation of Output Activation O 

3. O = XWj^Vj, for all / = 1 to n 

Where Vj is the PSF fuzzy output vote, 

• Weight Training 

4, For each positive event: 



Wj(t-Hl ) = Wj(t) + Ti Wj(t), for all ^ 6 



Wj(t+1) = W|(t) - 11 Wi(t), for all V, < 9 



5. For each negative event: 



Wj(t+1) = Wj(t) - ii„Wj(t), for all Vj > e 



Wj(t+1) = Wi(t) + 'n„Wi(t), for all Vj < 6 



G is the minimum fuzzy liking vote threshold. 
Update learning rate 
6. If Dositive event: 



PdU+T) = PdO) +PoW^(OrO - 0(t.1)/MAX^VCTE) 
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7. If a negative event: 



p^(t4l) = p„(t) 4p^(t)*(0(t-1) - 0(t)/MAX_VCTE) 



6 is a frequency reinforcing term 



[0183] The ProfAgent adjusts weights of the single layer, n node network according to a reinforced learning scheme. 
The n weights, preferably user specific, are initialized equally to 1/n; i.e., ail PSFs have an equal vote. Each time the 
user visits the same program, the training regime reinforces nodes that vote the program as liked, and penalizes the 
rest. This philosophy confirms preference predictions with observational frequency. To encourage stability and conver- 
gence, a learning rate r| applies an Incremental reinforcement signal to adjust weights. 

[0184] Two learning rates govern the training process, a negative (Tip) and positive (T|p) event rate. A positive event 
is when the user selects the program, and a negative event is when a program was available in a previously liked 
context, but another program was chosen. In practice, there is more causal information in positive examples, than 
negative ones. Hence, the T^p is normally much higher than Tj^. The momentum term pp (pj increases (decreases) 
training rewards when the preference voting output indicates an increasing (decreasing) preference trend between 
positive (negative) events, and lowers T\f^ {r\p) to reflect a positive (negative) learning bias. The 6 learning rate term 
reinforces program visitation frequency. The reinforcement is positive for each positive event, and negative otherwise. 
Overtime, the present preference learning system automatically leams the PSFs that best predict program preference, 
and de-emphasize the rest. The leaming rates are adjusted to approximate the time constant of user periodic preference 
shifts. All PSFs output a fuzzy preference rating range in steps from a minimum rating value(e.g., HATES_PROGRAM 
= 1), to a maximum (e.g., TOP_PR0GRAM = MAX^VOTE = 6). 

[0185] G is the value of the minimum fuzzy membership that indicates at least a program liking (e.g., Vj = 
LIKES_PROGRAM/MAX„VOTE = 4/6= .66). PSF reinforcement is determined by comparing a PSPs vote to 6. A vote 
is positive when Vj > G, and is negative otherwise. The effect is to reward a PSF voting weight if it voted positive 
(negative) during a positive (negative) program viewing event, and penalize it otherwise. 
[0186] The PDE calculates the program, or any state type, preference rating, R, as follows: 

• if the program Is already a state in a SSM: 





O;^ is the chronologically ordered array of preference votes for the program 
/ is the number of recent ratings for the program, t points to current vote O. 
q is the number of past ratings to include in the moving average 



c is a temporal weighting coefficient array, where c^, 1 

/v(t) is a trial independent, but frequency dependant biasing term where A-(t) 



1* the program is not already a state in a SSM 
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R = 0 



[01 87] The preference rating Rot a progrann is a weighted moving average of the current and past program, preference 
votes. A system designer skilled in the art sets the c^. distnbuiion lo the desired temporal bias of past ratings. Typically, 
time diminishes the value of past information, hence ratings decrease in value overtime. X(t) proportionately increases 
the overall rating according to the number of times the program has been rated. is preferably a small and slow 
growing function of viewing frequency: e.g.. X{X) = c'log(t) - [0 .03 .05 .1 .12. .15 ...]. The effect of MX) is to indicate 
greater preference for a programi the more times it is watched, independent of the PSF votes. 

[0188] For example, if a viewer always watches only the first 20 m.inutes of the Tonight Show, ihe program's rating 
would be low. however the frequency term ^^(t) would increasingly raise, say logarithmically: the rating to reflect the 
consistency of viewing preference; e.g., a 3% higher rating after 3 times, and 1 2% after 5 viewings, etc. All state types 
related to a program inherit the program's effective preference vote, O. 

[0189] For example, if the program has a LIKED rating, then its genre, channel actor etc. likewise receive a LIKED 
rating. The PDE calculates a state candidate's preference rating. R as described for the Title example above, shown 
in Equation (1). For example, if a user watched two comedy series, 'Seinfeld' with a LIKED rating, and then a good 
while later the user watched 'Friends' with an INTERESTED rating, then - LIKED = 4, and = INTERESTED = 
3, then the comedy_serles state preference rating is (with = .9, = 1 . M2) = .03): 

[0190] In this case, since there was a long time in between program viewing events, the older vote was reduced by 
1 0%. However, since the same genre was viewed twice, the rating received a 3% increase. The same preference rating 
algorithm applies to any state type. 

[0191] For each program viewed, the PDE passes a candidate state preference rating to the ProfAgent to update 
the BM according to the prescribed algorithms. The PDE directs the BCE to process all user training data, as if from 
one user, thus creating a single BM including behavioral information for all class members. 

[0192] The voting output of the ProfAgent Is stored in a voting history database (VOTEdbase), that the ProfAgent 
uses, as described herein, to determine the most preferred states to Iceep in the BMdbase. The PredictAgent also uses 
the VOTEdbase when responding to requests by the Preference Detennlnatlon Agent PrefAgent for rating content 
parameter preferences. The PrefAgent and its function is described in the commonly assigned, copending patent ap- 
plication [Docket No. 155785-0006/P01862, based on provisional applications 60/215,450 and 60/226,437]. The dis- 
closure of the copending application is herewith incorporated by reference. 

[0193] A recording manager causes the recording of programs by periodically Initiating a recording sequence. For 
that purpose, the recording manager sends a request to the preference agent PrefAgent for ratings of all programs at 
a particular time (X), or alternatively, for ratings of all programs within a particular time penod (X). In certain embodi- 
ments, the frequency with which the steps are perlomned may be changeable by the user The preference agent re- 
sponds by providing ratings, from a preference database, for each program received from the recording manager The 
recording manager then causes recordation of the programs at time X, or within time period X in accordance with the 
ratings received from the preference agent. 

[0194] The preference agent monitors the viewing selection of the various viewers using the control system and 
creates viewing profiles of each viewer that are stored in the preference database.- Based upon these profiles, the 
preference agent sorts through the incoming programming content as described in the EPG Infonnation to compile 
lists such as *Top 10" lists of viewing choices available at any given time to each viewer, and directs the recording 
manager to record the top-ranked program being broadcast at any given time (including any programs selected by the 
viewers for recording) and store It in a stored programs memory device. The preference agent further contains software 
that allows it to create a demographic profile for each viewer based upon the viewing profile of the viewer and certain 
algorithms or associative rules. These algorithms may be adjusted over time as the model employed by the system 
administrator is enhanced and its accuracy Improves. To this end, the system update information channel included in 
the broadcast signal may include periodic software updates, including new preference database parameters that may 
need to be included at the request of the advertising suppliers. Thus, in one embodiment the control system may be 
remotely upgraded to meet any new demands that may arise as advertising content providers become familiar with 
system and the process of custo.m tailoring narrowly focused, targeted advertisements. The demographic profile 
created for each viewer is stored in a demographic database, which resides in the control system and thus ensures 
the viewers' privacy. 

[0195] The preference agent also sorts through the advertising content streaming in through multiple advertising 
channels contained within the broadcast signal and. based upon the demographic profiles of the viewers and the meta 
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dRta ^cnioir.ed in ePich adverrisemeni :c describe the targei audience -or :he carticuiar adveriserr.dt stores and/or 
causes the display o- partic jia- adve-lisements. The control system -r.ay utilize a'^y a variety at nethcds tc manip jlate 
tne advertising content, as described below. 

[0196] The PredictAgent connbines p'-eference voting history information with ccntexiuai BM preferences to produce 
a rating thai the TASAgent. and other system modules, use to make preference-reiated decisions, 
[0197] PredictAgent aggregates historic votes tc produce an overall rating for the modeled state. PredictAgent has 
the same learning architecture as ProfAgent. Unlike the P'of Agent, however that learns -eature to feature contribution 
metrics, PredictAgent learns the optimal instance to instance statistical paramete-'s. Instead of PSF inputs, there are 
th-ee voting hisio^ statistical inputs: sample count, sample max. and sample min. and their respective voting weights 
as follows: 

■ CntPctGoef , weight for the number of times the state was visited 

■ MaxPctCoef. weight for the maximum vote ever observed 

■ MInPctCoef, weight for the minimum vote ever observed 

[0198] The following is the pseudo code for the preference rating calculation, in the exemplary title state case: 

validStates = find(cntLTitleVote); 

TeffMinVote = avgLTitleVote-sdvLTitleVote; 

maxTcnt = max(cntLTitleVote(validStates)); 

maxTmax = max(maxLTitleVote{validStates)); 

TcntAdj = log(cntLTItleVote(validStates))/log(maxTcnt)- 

log(mean(cntLTitleVote(validStates)))/log(maxTcnt); 

TmaxAdj = log(maxLTrt:leVote(valldStates))/iog{maxTmax) - 

log(mean(maxLTitleVote(valldStates)))/log(maxTmax); 

TminAdj = (minLTitleVote(validStates) - mean (mi nLTitleVote(validStates)))/MAxTITLe VOTE; 

TeffVote = TeffMinVote+TeffMinVote.*(TcntAdj*cntPctCoef + TmaxAdj*maxPctCoef + TminAdj*minPctCoef); 

[0199] The votes are assumed to have a Gaussian distribution, and a conservative rating is desired. TeffMin Vote is 
the average vote reduced by the standard deviation of ail votes. This Is a voting cluster classification cut-off. MaxTcnt, 
and maxTmax are normalizing maximums over all state visitation counts, and highest max vote. Learned adjustment 
factors bias TeffMinVote according to the learned adjustments TcntAdj, TmaxAdj, and TminAdj. These vote adjustment 
parameters range from zero to one, and grow logarithmically with stimulus, and are further normalized by their respec- 
tive average value. Such a rating policy favors consistently high observed ratings over unstable preference ratings that 
may average high. The state count adjustment factor gives a positive (negative) bias to more (less) frequently watched 
states. Thus, a one event high preference vote for an action movie, could score lower than a daily average vote for a 
comedy:series. This equation term helps overcomes the case where the preference rating is, for some reason, inac- 
curately low, but the user repeated behavior warrants a higher vote. The TmaxAdj term helps make the preference 
rating meaningful relative to the users preference range. That is, if a user has never demonstrated a very high rating, 
possibly due to poor system performance, ratings that approach the user's personal maximum, should be bias upward 
to Indicate a relatively high score for this user. This is especially useful when comparing ratings between users in a 
multi-user ID case, for example. The TminAdj vote activation level biasing term favors state votes that deviate signif- 
icantly above the average minimum vote over ail related states. This is a soft greedy vote skewing strategy that uses 
the users lowest responses as a reference point to infer high confidence in higher relative ratings. 
[0200] The PDE uses the PredictAgent's global adjusted preference ratings to determine the states to replace with 
more preferred states, when a maximum state count is reached due to memory limitations, if any. Hence, the BM is 
continually updated to contain the most relevant states (potentially all visited states i' memory permits). 
[0201 ] The next step of the advertising category, prototype building process is to parameterize the BM into a pseuoo 
Euclidean space. Since modeling data structures in the BM are not one-dimensional Gaussian distributions, determin- 
ing distance between two BMs is a difficult, and inaccurate procedure when using prior art techniques. Prior art tech- 
niques assume sampled data has a bel! curve shape distribution, and model the data as Gaussian, defined by a mean, 
u. and variance, o. However, as shown in =^ig. 11 . 13, and 14. samples in vahous modeled categories are not normally 
distributed, but exponential, beta, uniform, delta, or multi-modal. Importantly, transition m.atrices do not lend themselves 
to standard distance metrics required to determine cluster membership. Known classification methods define a Eucli- 
dean feature space consisting of cluster neighborhoods centered at the cluster means, a. with cluster boundaries 
extending o from u. The Mahaloanobis distance is traditionally used tc discriminate cluster membership. The Ma- 
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h?.lo£nobis distance is simply the Euclidean disiance divided by each cl-jste^s dimensional g: or 



where Z is ihe mutual covariance matrix 

[0202] This method is very inaccuraie. and impractical in TV systems, it is inaccurate for two primary reasons: it 
falsely assumes Gaussian sample data, and the inversion of the covariance matrix introduces significant floating point 
round off errors that otten render the matrix singular. In high dimensional space, e.g. over 1 00, calculating and inverting 
TO a covariance matrix inversion can be prohibitive in CPU time, and memory. In a typically sparse sam.ple matrix, many 
unnecessary cross correlation terms must be manipulated. Standard methods are similarly not applicable to determin- 
ing the distance between corresponding SSMs. Thus there is a need for a novel strategy to represent multi-modal 
clusters and distances between them. 

[0203] The BM is parameterized into three general classes of behavioral dimension data types: histogram, scalar, 
15 discrete. To represent SSMs in a classification space, general, instance independent, behavioral patterns are identified, 
and extracted as dimensional classification parameters. Each matrix parameter is a dimension in a pseudo-Euclidean 
classification space. 

[0204] Some typical SSM parameter categories are: 

20 1 . Transition bias histogram 

2. Self-transition bias histogram 

3. Turn-on (off) state type bias histogram 

4. Transition reversal bias histogram 

5. Single transition ratio 

25 6. SSf^ matrix sample confidence 

7. Bias to top n states 

8. Top n states 

[0205] Items 1 through 4 are distributions of observed bias for the corresponding behavioral patterns. Item 1 repre- 
30 sents the amount of bias for transitions to occur over random. Another important behavioral category, item 2, is how 
likely are transitions back to the original state; i.e. . going from a comedy to a comedy. Item 3 captures a user's expected 
session start or ending states, for all state types. Item 4 represents a distribution of bias levels to make a transition 
biased In one direction over another. Some scalar parameters include the ratio of single to all transitions, item 5, matrix 
non-random bias, item 6, and the bias to transition to the top SSM states. 
35 [0206] The state sequence model, for all state types, has general parameterizations including: 

[0207] Sequence length histogram. Ratio of unique sequence states to all states visited. Fraction of liked states out 
of all sequence states visited. Sequence state focus. Maximum sequence length. Ratio of sequence transitions to all 
single transitions 

[0208] Classification dimensions related to the hopping behavior are preferably parameterized as follows: T^^ 
40 histogram. Atn"a, histogram of program fraction times between hops. At, histogram of times between hops, and a 
histogram of number of hops per program 

[0209] Finally, the system utilizes a variety of program-related feature dimensions. These dimensions include: T^ 
trans, time watched per transition histogram. T^/prog, time watched per program histogram. Ad T^ T^, advertisement 
time watched per time available histogram. T^session, time watched perTV session histogram. Viewed program start 
45 time of day Tn^ATg T^/Tq, T/T^ histograms. And, number of unique states visited per time period. 

[0210] A detailed listing of parameterized dimensions used in the BCE need not be provided within the framework 
of this specification. Those of skill in the pertinent art will readily be enabled to establish the necessary parameter 
dimensions, including variations., parameterizations, and extrapolations. 

[0211] The novel bias calculation algorithm determines the qualitative evidence for a ncn-unif ormly random selection 
50 process: and hence, the likelihood for meaningful behavioral infonnation. The expected uniformly random matrix bin 
coverage is calculated using the binomial distribution. Each user action is viewed as a pass-fail event to fill a given 
bin. The number of thais in the binomial experiments, or state transitions, is the number of transitions in the matrix. 
The probability of an event success, or filling a particular matrix bin, is the uniformly random probability that any bin is 
selected. The binomial probability for a given bin to be filled after a certain number of trials, translates to the number 
55 of bins in the matrix expected to be filled by a random process. Thus, the ratio of how many bins would be filled by a 
uniformly random, process, to the actual number of bins filled, Indicates a biased, or non-uniformiy random, process 
behind state transition selections. The bias measure is additionally a quantitative indicator of s statistically significant 
sample size. If there are not enough samples in the matrix to infer a non-random SSM transition process, the bias 
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mep.sure :s less t^sn or ecua to one. Pnor art metnocs gcnorp.Ity rcGuire sanpies in a n by n matnx to daterm|re 
■f :he cc^anance natr:x is expected tc be s:atis:ic£!ly significant, "his -eqj recent is prohibitive as n gets ia-'ge For 
exar-pie. for a 30 by 3C matrix t-adinonal methods require 30x30. or 900 samples, which is impract.ca: :o ob:air in a 
s^on penoc ot jser "TV usage. The reason prior an has this constraint, is that they require enough =nrc'T--.aiiDn :o «nfer 
ccnfiderce in ail cross-correlation terms in the covariance marix. sirce there is no a p-iori way to pre j c: wh ch are 
significant, even the vast majority of these terms are zero. The present matrix bias detection method detemiines 
statistical s gn.ficance continuously, and often converges on order n samples. 

[0212] The following is a detailed comoutational descriptior of the bias estimation technique, as applied to the 

Liked_Channels transition matrix. 

[0213] The pseudo-code function definitions include the following: 

sum(X) - sums tne columns of matrix X. if X is an array sum elem.ems to a scalar result. 
length(X) - returns the greatest matrix dimension length of X. 
find(X) - returns all non-zero elements of X. 

X(1:5,1:5) - returns a sub-matrix X' that is rows 1 to 5, and columns 1 to 5 of X. 

Y=binocdf(X,N,P) returns the binomial cumulative distribution function with parameters N and P at the values in X. 
unlon(A,B) - when A and B are vectors returns the combined values from A and B but with no repetitions. 
sqrt(X) - the square root of the elements of X 
sort(X) - sorts the elements of X in ascending order. 

[0214] The following rules pertain to matrix uniform random calculations: 

1 . uniform probability to transition to a certain state from a certain state: 

Ptrans_rand = 1/(NUM_LIKEDCHAN_STATES-1) 

2. uniform probability to choose any possible transition (do not count the TV OFF state): 

state^Prob = 

1 /sum(sum(LikedChanTransMtx(1 :NUM_LIKEDCHAN_STATES,2:NUM_ 

LIKEDCHAN_STATES))); 

3. unifomn probability to start or end users session in a certain state: 

PonOffRand = 1/(NUM_LIKEDCHAN_STATES-1); 

4. bias vector to start In a certain state (stateOFFbias similar): 

stateONbias = 

(LlkedChanTransMtx(START_VIEWlNG.1:NUM_LIKEDCHAN_STATES)/ 

ON_SESSIONS)/PonOffRand; 

5. bias over random tc start-up surfing: 

ViewrLchOnSurfBias(viewerJdx) stateONb.asfSURFING): 

6. bias over random tc start-up \n Unl'ked state: 
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ViewrChanOnUniikeBias(viewerJcx) = siateONbias(UNLIKED); 

[0215] Matrix sample concentration bias calculation follows these rules: 

1. bias as a multiple over uniformly random for non-stan-end state transitions: 

bias_mtx = (UkedChanTransMtx(2:NUM_LIKEDCHAN_STATES, 
2:NUM_LIKE0CHAN_5TATES)/NumJrans)/Ptrans_rand; 

2. number of unique transitions between non-starl-end states visited by user: 

numFilledBins = length(find(bias_mtx(:))); 

3. number of unique transitions between non-starl-end states possible: 

numBins2Fil! = length(blas_mtx)^2; 

4. fraction of possible bins actually filled: 

mtxFIIIRatio = numFilledBins/numBins2Fill; 

5. binomial bin selection expectation given number of samples and uniformly random success ratio: 

ExpCoverage = 1- binocdf(minHitsPbin-1 ,Num_trans, 1/numBins2Fill); 

6. actual bin filling success ratio observed: 

ActCoverage = numFilledBins/numBins2Fill 

7. ViewrLchMtxConf(viewer_idx) = ExpCoverage/ ActCoverage 

[02161 The matrix confidence ratio (MtxConf) Indicates the likelihood of a non-random process bias. Thus, it tends 
to give the confidence that a sample set is large enough to infer It has a non-unifonn-random origin. Increase the 
minimum hits, or successes, per bin (minHitsPbin) to increase confidence in an adequate minimum sample set size 
(typically, minHitsPbin = 1 is practical). Inter-prototype, or local, dimensional pruning follows the BM parameterized 
step in the advertising category prototyping process. High variance or similarly, low bias, dimensions are removed. 
The most representative classification features are those that have a tight sample distribution spread. Features with 
more uniformly spread data approach a uniformly random distribution, and are not as useful in cluster discrimination. 
A typical pruning cutoff is one standard deviation for Gaussian modeled scalar features (Ccud-- ^ ^'^^ (Pcut) '^ss 
than or equal to a uniformly random expected sampling spread, otherwise. The system designer achieves an increas- 
ingly strict pruning criterion by decreasing a^ut. and increasing p^^t- "^^^ ^^^'^^ ^ pruning policy, and valuable cluster 
separation infomnation is lost, and could result in an empty prototype by removing all dimensions. Too relaxed a thresh- 
old results in loosing classification performance by including many non-predictive features. The product of the inter- 
prototype pruning phase is a preliminary advertising category template prototype. The preceding algorithms are applied 
to each training set, creating a locally pruned, possibly empty, reference profile for each. 

[0217] After local prototype pruning, global or Intra -prototype, dimensional pruning further removes superfluous in- 
formation. In this pruning stage, each advertising category prototype is compared to every other one and dimensions 
that do not separate any of the clusters are removed. To measure cluster distances involving non-scalar, non-Gaussian 
dimensions, however requires a novel method. Known methods define a sample point in a coherent, high dimensional 
space. However ihe BM does not correlate, or preserve, feature values for each observation. Instead, all sample data 
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dimensions register the observed feature values :ntc their respective distribjtion Ticdeling histograms, as if they oc- 
curred independent of time, and any other dmens on Thus, it is not possible to define clusters as sample ccirts of 
the BM in an n-cimensional Euclidean space. Importantly this tradition classification ciustenng approach reqjires 
exponentially more nenory to stDre each sample point in feature soace. and its transition history. Instead, the present 
feature space is an n-dimensional pseudc-Euclidean construct that replaces absolute distances with relative correla- 
tions between cluste'*s. Since :he samp e points In each dimensions histograms have no cross-dimensioral correlation, 
no cluster has a spatial neighborhood representation. In high dimension feature spaces, a cluster neighborhood is 
mainly useful the samp.es are Gaussian distributed, since the variance adjusted cluster means are used to calculate 
distances. However, in multi-modal distributions, i.e., not beil shaped as is the present case, this representation has 
tittle advantage as Euclidean distance no longer applies in the traditional sense. Discrete feature variables, such as 
program names, pose an additional complication in Euclidean space, in that they are not numeric analogs of the feature 
dimensions, but set theory representations. To overcome the limitation of prior art. a new distance metric determines 
if sufficient classification distance exists between two multi-modal clusters in feature space. The present classification 
architecture replaces the Mahaloanobis distance, or variance adjusted Euclidean distance, of prior art with a dimen- 
sional voting architecture that estimates cluster neighborhood overlap as a percentage of dimensions that vote the 
overlap exists. This, alone, is not a large departure from current art, however, the metric of determining overlap between 
non-scalar, and non-Gaussian distributed clusters is novel. Again, there are three principle types of data, each handled 
differently; that is, scalar, histogram, and discrete, as defined herein. Scalar feature dimensions are modeled as Gaus- 
sian, and handled in the standard \i, o^ut neighborhood discrimination method. Classification distances between cor- 
responding histogram feature dimensions, however, are calculated as distribution correlations. 

[0218] Distribution pseudo-correlation is defined as one minus the ratio of the distance between certain histogram 
bins, to the worst case distance. This simulates the desirable correlation behavior of: 

1 . output values are between 0 and 1 

2. output linear increases (decreases) the more (dis)similar the distribution shapes and amplitudes. 

[021 9] The following commented procedural pseudo-code (in MatLab coding) determines if two histograms in a fea- 
ture dimension belong to the same class (discretionary cutoff values are set with exemplary values): 

1 . To get the worst case distance, treat each histogram bin as an orthogonal Euclidean feature vector and calculate 
the worst case distance between them by placing all the samples of each in different bins. 

worst_diff(:) = 0; 

worst_diff(1 ) = hist1_alLsamples; 
worst_diff(NUM_HIST_BINS) = hist2_alLsamples; 
worsted istance = sqrt(worst_dlff*worst_diff); 

2. Calculate the effective Euclidean distance between the two histograms 

hist2hist_diff= hist1-hlst2; 

histDiff_sqr = hist2hist_diff.*hist2hist_diff; 

user_dlst = sqrt(sum(histDiff_sqr))/worst_distance; 

3. Determine the probability density functions (pdf) for each histogram 

hist1_pdf = hist1/ hlst1_alLsamplesi 
hist2_pdf = hist2/ hist2_alLsampies; 

4. Find the bins with the most distribution density, and sort on density 

[hist1_mass hist1_mass_bins] = sort(hist1„pdf): 
[hist2_mass hist2_mass_bins] = sort(hist2_pdf); 

5. For hist1 and h!st2, the find most dense bins v^'tth one standard deviations wc^h of sample points 

bin1_1sdv= NUM_HIST_B!NS - min(find(cumsum(fliplr(hist1_mass))>.68))-^1 ; 
bin2_1sdv= NUM_HIST_BINS - min(find(cumsum(fliplr(h'St2_mass))>.68))-i-1 : 

bins_1sdv = union([hist1_mass_bins (bin1._1sdv:NUM_HIST3lNS)], [hist2_mass_bins (bin2_1sdv: 
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NUM_H13T_31NS)]); 

p Model the variance between histograms as the variation of the bin to bin distances, determine the average bin 
to bin distance between 1 sigma bins in hist1 and hist2 this is the estimated distance variation between corre- 
sponding sample points the more consistent the distance between bins, the more certain is :he separation between 
histograms 

mean_diff = mean(hist2hisi_diff(bins_1 sdv)); 

deviation = hist2hist„dift(bins_1sdv)-mean_diff: 

variance_dist = sGrt(mean(deviation*deviation'))/worst_distance; 

7 Define histogram correlation as the ratio of the bin-wise Euclidean distance, to the worst case distance, define 
histogram classification con-elation as the ratio of the one standard deviation bin-wise Euclidean distance, to the 
worst case distance. 

user1 sdvDist = sqrt(sum(histDiff_sqr(blns_1 sdv)))/worst_distance; 

8. Calculate dimension-wise clusters as separated if the sigma reduced cluster distance is positive, for all types 
of dimensions 



ClassCutoffDims(idx) = user1 sdvDist-variance^dist; %for idx = 1 to NUM^DIMS 



[0220] The global dimension reduction procedure removes a dimension it ClassCutoffDims for a particular classifi- 
cation dimension is negative over substantially all Ad prototypes, since that feature has little, or no. predictive value to 
the system. The result of the local, and global prototype pruning is a minimal description of important feature values 
that identify, and separate each advertising category. 

[0221] The BCE provides the Targeting Server (TargServer - Fig. 6) with the optimized set of advertising category 
prototypes for download to the MemberAgent in TV. The final step in the Ad targeting system is to classify a TV user 
into their most likely Ad categories. The MemberAgent then applies the exact same cluster distance to the identical 
BM as on the server side, with the addition of the following classification steps: 

1 . for all types of dimensions, calculate cluster membership as separated If the sigma reduced cluster distance is 
positive, then calculate the fraction of separating dimensions 

ClassCutoffbims(idx) = user1sdvDist-variance_dist; %for idx = 1 to NUM^DIMS goodDimVec = find(Class- 
CutoffDims >0); %find all separated dimensions 
NUM„GOOD_DIMS = length (goodDimVec); 

goodDims(goodDimVec) = 1; % set good Dimensions, goodDims init to 0 AdGroupClassifRatio = 
NUM_GOOD_DIMS/NUM_DIMS; 

2. determine pass or fail advertising category membership for system modules infomnation CLASS_VOTE_CUT 
= .4; % fraction of dimensions that must be adequately separated to count cluster as classified 

redetermine clusters as separated if fraction of classification vote is CLASS_VOTE_CUT 
it AdGroupClassifRatio >CLASS_VOTE_CUT 
AdGroupClusterVote = 1 ; 

else 

AdGroupClusterVote = 0; 

end 

[0222] Each advertising category group has a AdGroupClassifRatio which is the fraction of total dimensions that 
were adequately separated. It is a proportional measure of how similar to behavioral clusters are. That is, the more 
(dis)similar a user's behavior Is to the advertising category prototype, the more (fewer) dimensions will overlap, hence 
the higher (lower) the AdGroupClassifRatio. Thus, each advertising category prototype has a AdGroupClassifRatio, 
orTargetingValue. that characterizes the degree that a user belongs to that targeted Ad group. The TargServer provides 
this information for each advertising category as a distribution of relative membership likelihoods. targServer addition- 
ally detennines a pass-fail advertising category membership value for system modules that require a binary prediction. 
AdGroupClusterVote is a binary membership value equal to one if there were sufficient dimensions that separated the 
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user frcrr. the advenisirg category pfoiotype jrder cons'deraiion and zero othe-^vise. 

[0223] MemcerAger.t pericdica!!y. c on demand computes a users adve^Jstng catego'Y nembersnip liKeiihoods 
for system modules to use The TASAgent uses advertising category membership ir.formaiion :o store content that 
better rr.aicnes a user's nterest. or an advertising ciient's mai-keting goals. According tc one embodiment the TASAgent 
stores and deletes programming to statistica'ly maximize the overall "argetingValue ol tne archived content. 
[0224] As a oasic example, consider the case where TargServer provides three templates- Male. Female, and teen- 
ager - to MemberAgent, and it ca culates the "TV user's membership TargetingValues as .2. .3. and .4. Then. TASAgent 
would only store Ads wi:h metadata matching these categories, and in proportion to the TargetingValues. If the Target- 
ingValues were normalized to sum. \o one. then they couid be read as probabilities of Maie = .14. Female .29. ard 
teenager = 57 Hence the TASAgent would store, and delete Ads. to match the same fractional distrioution in local 
storage, and have stored Ads being 14% for Male. 29% for Female, and 57% for teenagers. The DispAgent similarly 
distribuions Ad presentations to match Ad categories membership distributions. A wide variety of alternative, and more 
sophisticated targeting optimization strategies that fit into, or extrapolate from, this philosophy are possible. 
[0225] It will be understood that inferring an advertising category' from TV usage behavior is a very similar problem 
to identifying multiple persons in a household. The main difference is that the user prototypes are probabilistically 
inferred with real-time, untagged TV click-stream data. The same methodology and architecture applies to both prob- 
lems: however the multi-user identification problem principally requires additional techniques to effectively allocate TV 
usage observations to the correct user profile. 

[0226] In addition, or as an alternative, to Ihe foregoing description of the system to narrowly focus advertising targets, 
the system of the invention is also suitable to build preferred programming models. Here, the presentation agent, 
PresAgent interacts with the behavioral model BM to build local programming guides. PresAgent derives user pres- 
entation preferences through queries to the BMQagent. To motivate its necessity, an abridged system level summary 
precedes BMQengine interaction details with the PresAgent. In brief, the goal of the PresAgent is to build a programming 
guide for a virtual channel whose programming comes from programs locally stored by the TASAgent. The programming 
can be entertainment or advertising, audio, video, graphics, or any multi-media content. The TASAgent only stores the 
most preferred programs available, and constantly adds and deletes programs to continually fill the local storage while 
maximizing the overall user program preference rating. This virtual programming guide or virtual channel may have 
the look, and feel similar to a nomial TV channel. It should seem very natural to place it as just another line in a live 
TV program guide. However, the virtual channel has the advantage of being customized to the user's preferences, and 
appears as an 'on demand* channel with content and showing times that largely match the viewer's persona! expec- 
tations. To approach this goaL the PresAgent analyzes the stored programming presentation metadata and user's 
preferences to determine the optimal temporal program placement in the virtual channel's EPG (VEPG). 
[0227] Program targeting metadata, especially for Ads, includes presentation information. Ad presentation metadata, 
from the head-end, directs the PresAgent to either follow these rules exactly, or to use local preference Information to 
more intelligently sequence Ad content. 

[0228] For non-revenue generating stored programming, the user has a similar option to direct the program arrange- 
ment of the virtual channel. Several VEPG building modes are possible, ranging from trivial, to highly context dependent. 
[0229] A trivial implementation simply displays the local storage contents in the order of when they were recorded, 
and places paid-programming content exactly as specified in its presentation metadata. This has the advantage of 
simplicity, but burdens the user to search through many undesirable programs, and tends to force skipping around the 
guide for each program viewed. This is one step above analog VCR recordings in that it has random access, and a 
content listing. 

[0230] A more sophisticated approach uses non-temporal program preference information to group programs of 
similar ratings. To the extent preference ratings are accurate, this method has the benefit of making it easier for the 
user to skip less liked programs, and continuously view liked programs with much less searching effort paid. However, 
there is still the overall feel of a sorted storage media content listing. 

[0231 ] A significant advancement over the content preference sorting technique, uses temporal, and sequential pref- 
erences to create a VEPG ordered according to the real-time viewing context and preference history of the user. To 
accomplish this, when the user turns on the TV, or periodically before the TV is turned on, the PresAgent queries the 
BMQengine with each stored program presentation context, and dynamically builds a VEPG that best matches the 
user's behavioral preferences at that time and circumstance. 
[0232] The following general algorithmic steps build a VEPG for a typical case: 

1 . Find all undesirable viewing times and leave them empty. 

2. Place the most likely program preferred at TV power on in current time slot. 

3. Find all local program transition combinations and temporal preference and sequence programs accordingly. 
[0233] As a tutorial example, assume the following 9. presumed preferred, programs are locally stored: 
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Title 


L^enre 


Air Time 


w n a n n e 1 


uuraiion \\\\\\\) 


Market Wrap 


Series/Finance 


iprn vveeKaays 




CP. 

i £,\J 


Star Trek 


Senes/Sci-ri 


\ arn weeKoays 


1 {DM 




Friends 


Senes/Comedy 


8pm weekdays 






The Tonight Show 


Senes/Talk 


1 i .30pm weekdays 




DU 


1 he lerminator 


rvjovie/oci-n 


SkJI 1 1 V V t7U . 


HBO 


150 


Seinfeld 


Series/Comedy 


7pm weekdays 


FOX 


30 


Saturday Night Live 


Series/Comedy 


1 1 :30 pm weekdays 


NBC 


90 


NOVA 


Series/Docume ntary 


9pm Tues. 


PBS 


60 


NFL football 


Sports 


6pm Mon. 


ABC 


210 



[0234] While the TV is off, or upon turn on, the PresAgent determines the context of the current session. The current 
context includes Information such as day of week, the time of day, time since last session, and last title/genre/channel. 
[0235] The first items to determine are the time intervals never watched. This will blank out VEPG time intervals that 
historically often go unwatched. The query looks like: 

•QueryFunction = time_sum. StateType = LikedChannels, fromStatelD = null. toStatelD = null. TimeType = TOD, 

TimeValue = nuir. 

[0236] Here we used LikedChannels as TOD activity indicator. Any other 'liked' state category would have served 
equally well. 

[0237] A typical response to the TASAgent's query could be: 

[(LATE_NIGHTVERY„OFTEN); (WEE_HOURS, NEVER); (EARLY_fy/}ORNiNG; NEVER); (MORNING, MOST- 
LY); (LATE.MORNING; RARELY); (AFTER_NOON; RARELY); (LATE_AFTER_NOON; SOMETIMES); (EVENING; 
ALMOST_ALWAYS); (NIGHT, TYPICALLY)] 

[0238] The TASAgent searches the result matrix for the least likely TOD Intervals, in particular 

[(WEE.HOURS, NEVER),); (EARLY_MORNING; NEVER), (LATE.MORNING; RARELY); (AFTER_NOON; 
RARELY)]. 

[0239] The corresponding time intervals would be left blank in the VEPG. However, if the current TV viewing period 
is in a blanked interval the current VEPG time intervals are made available for at least the user's typical TV session 
length. That is, upon TV turn on, there is always programming listed in the cun-ent VEPG time index, and at least as 
long as the user normally watches TV for that period. 

[0240] The available time intervals are searched for preferential program placement. 

[0241] The PresAgent proceeds to search for programming that the user prefers upon starting a TV session. Each 
program is searched for channel/gen re/tltle/actor/etc. start-up preference. Each modeled behavioral state (i.e., liked 
Chan, genre, title, etc.) is queried, and results are accumulated in a StartUpRatings matrix. A typical query to search 
for start-up genre preferences is: 

Query( [QueryFunction = top_n=5, StateType = genre, fromStatelD= off, toStatelD= null, TimeType = TOD, Ti- 
meValue = night]). 

[0242] The same query style is repeated for each state type, and the results are compared against the available 
programs. 

[0243] PresAgent further considers contextual preferences with respect to the last program viewed by searching 
through every combination of temporal and Sfafe Type transitions. 

Assume that the new session's DOW = Monday, TOD = night (1 0 pm), last_tltle = 'Wheel of Fortune'© Monday evening 
7pm, last_genre = game_shoW: and last_channe! = NBC. 

[0244] A typical quer^' includes a search for likely transitions occurring the amount of time since the last title, genre, 
and channel viewed, three hours (1 0pm - 7pm) for this example. A search for the top 3 preferred title transitions three 
hours after watching 'Wheel of Fortune*, is: 

Query: [QueryFunction = top_n=3, StateType = Tttle. fromStatelD= 'Wheel of Fortune', toStatelD= null, TimeType 

= TIP, TimeValue = 3hrs]. 

[0245] A similar search is repeated for genre, and channel. 

[0246] PresAgent compares the bias for all StartUpRatings an6 last program based preferences, against the remain- 
ing programs for the best match. If, for this example, a likely start-up genre was 'comedy series', and the most likely 
start-up channel is 'NBC, then a matching program with the highest preference rating, say Seinfeld, would be placed 
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^.s ihs program in the curreni tine slot if no other iransr.ior is -no-e preferred three hours afrer walchir.g 'Wheel of 
Fortune*. game_shows. or NBC. 

[0247] The systenr keeps a table of session times fcr every time interval, each day of the week. !f a typical session 
for this user ai this t.me is 1 .5 hours, tnen this is the time b:OCk to fili. Tne ProsAgent tests each stD-ec progran^. for 
transitional bias to follow Seinfeld. Again, all fiked 3ta:e types are searcn for the most l:Kely transition 'rom the state 
associated with Seinfeld. A typical first query of the over all transition preference search could be: 

[QueryFunction ^ mostLikeiy. State~ype = Title. fromStatelC= 'Seinfeld'. toStaie}D= nul.. TimeTyoe = TOD. Ti- 
meValue = night] 

[0248] PresAgent finds tne closest match and places it after Seinfe;d. For exam.ple. the user may have a strongest 
preference to watch a science documentary series after a snort comedy independent of time, and NOVA would best 
follow. After two programs are linked. PresAgent also queries for any type of state sequence preferences; i.e.. title, 
genre, channel, etc. Fo-'the example sequence, an initial query is: 

"QueryFunction = mostLikeiy. StateType = Title Sequ. fromStatelDs = ["Seinfeld", "NOVA"], LengthValue = nul!" 

or 

"QueryFunction = mostLikeiy, StateType = GenreSequ, fromStatelDs = ["series :comedy", "senes:science"], 
LengthValue = null" 

[0249] PresAgent attaches the most likely and specific result to the prior sequence. This process continues for each 
program at the end of the growing sequence, until the typical session time block is filled. PresAgent proceeds to fill all 
other available VEPG time blocks. Each block of time starts with a sequence seed to grow from. 
[0250] The PresAgent tests each remaining program over all available time slots, and places highly likely temporal 
(non-temporal. DOW, TOD, TIP) matches accordingly. For example, a typical query to check the 'Football' program 
placement preference, could start on an available Sunday afternoon slot: 

Query: [QueryFunction = mostLikeiy, StateType = likedGenre, fromStatelD= 'sports', toStatelD= null, TimeType 
= DOW, TimeValue = Sunday] 
and 

Query: [QueryFunction = mostLikeiy, StateType = likedGenre, fromStatelD= 'sports'. toStatelD= null, TimeType 
= TOD, TimeValue = afternoon] 

[0251] If watching sports on Sunday afternoon was more likely than any other remaining program, and alternate time 
placement, then football would start that time block, and the herein described sequence building method would fill the 
rest of the session block. 

[0252] The first pass of the VEPG placement algorithm only commits highly preferred programs in each context. If 
any programs remain for VEPG entry, subsequent iteration's place the most likely programs. If there is not sufficient 
historic evidence to infer upon, the PresAgent makes arbitrary placements as a last resort. 

[0253] Importantly, every time a viewer turns on the TV, or a new user is detected, PresAgent generates a potentially 
different VEPG customized to the viewers preference, and the context of that session. Several other refinements, 
optimizations, and extensions on the basic VEPG building mechanism are possible and contemplated. Some, herein 
described, additional contextual resolution techniques expand on the aforementioned algorithm. The BMQengine pro- 
vides the PresAgent with many other contextual, and behavioral bias queries. Some Include: 

1 . Last program watched 

2. Behavioral psychometric 

3. Attention span 

4. ending bias 

[0254] These measures effect sequential program placement preference as follows. 

[0255] Item 1 is a mechanism to recalculate future VEPG entries based on the last program viewed by the user. 
Similar as in determining starl-up program preferences, discussed herein, the PresAgent queries for all temporal and 
Sfafe Type transition preferences from the programs chosen by the user. The VEPG is rebuilt, as previously prescribed, 
with the most likely query result matching program as the new seed. 

[0256] PresAgent uses psychometrics, item 2. such as diversity curiosity, focus, and attention span to adjust program 
sequencing closer to the viewer's liking. For example, if the user has a very high (low), genre diversity or focus meas- 
ures, then the ProsAgent proportionately avoids (prefers) sequences that repeat the same genre. Similarly, a high (low) 
curiosity measure biases the PresAgent to p'-oportionateiy prefer (avoid) related, but less frequented sequence can- 
didates. A high (lew) curicsi^y metric may arise from a user who has a low (high) atte.ntion. and often fails (succeeds) 
10 find liked programming. Another possibility is that the user has a small core grojp of liked program types, but often 
searches beyond this group for new programs of interest. The curiosity measure, thus looks for a user's high tendency 
to search outside past liked program types, with little information of why. 

[0257] Attention spar, item 3. detects the amount and quality of time a users tends to spend on various program 
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aspects. A histogram models the attention distribution for each mode.ed parameter. Tracked parameters include any 
EPG category' entry, such as a genre type, a title, a channel, or TV watching statistics including session times. All 
anention span parameters are context sensitive as supported by the BM. PresAgent uses attention span to determine 
the program length most preferred by -he viewer in a given context. For exam.ple, a user could generally like a long 
5 drama movie, but not prefer it in the mornings before going to work, or after a game show, and most prefer it on weekend 
nights. This presentation filter avoids many of the obvious user program length, preference conflicts by using past lim.e 
watched in a given context as a bias to favor future programs of simiilar length. 

[0258] Once a TV session, or a VEPG program sequence, approaches the typical TV watching aUention span in that 
context, the PresAgent has a preference to place programs that best match user's ending bias history. 

10 [0259] Endingbias,item4,isthepasttendencytoendaTVsessionafterwatchingacerLain BM category. An example 
of the Prof Agent learning a new ending bias from program replay selections, is if a user often stops watching TV in the 
late evening after replaying late night talk shows, such as The Tonight Show' during the week, instead of late night. 
The ProiAgent learns from live, and replayed program usage equally. The ability to learn preferred replaying patterns 
enables the PresAgent to adaptlvely place programming in the most preferred VEPG context. A typical TOD ending 

TS £>/as BMQagent query for any StateType, e.g., tor talk shows, appear like: 

Query: [QueryFunctlon = mostLikely, SiateType = likedGenre. fromStatelD= null, toStatelD= Series/Talk, Time- 
Type = TOD, TimeValue = late_evening]. 

[0260] The PresAgent, having a similar task as the PDE, creates a virtual program viewing guide that tends to match 
the daily variation and novelty that a user prefers. 

20 [0261] PresAgent places VEPG Ads in a very similar fashion as non-paid programming, except placement decisions 
include presentation metadata and machined learned user targeting infomnation. Ads differ from programs In another 
significant way, in that their initial VEPG placement is only a default initialization, subject to modification depending on 
the users programming choices, rights of the Ad company as communicated via controlling metadata. Although con- 
templated as an alternative embodiment, it is similarly possible to continually rebuild the programming VEPG based 

25 the user's in-progress viewing behavior; however, the primary benefit applies to Ad scheduling. The Ads are not nec- 
essarily visible in the VEPG, but are scheduled as inter-program and intra-program commercial breaks. The breaks 
are either head-end (intra-program), or internally (inter-program) generated. 

[0262] The PresAgent is aware, in advance, through program metadata, or some other means, of the exact timing 
for intra-program Ad breaks. In the present embodiment, the PresAgent prefills all Ad breaks with optimally selected 
30 pre-stored Ads. In practice, there is a significant advantage to this procedure. Often, due to limited system resources 
in the TV. there is not enough time to, in real-time, calculate the best Ads to schedule in an Ad break that may be only 
a few minutes away. This situation usually occurs at the beginning of a program, or when someone arrives just before 
a scheduled Ad break. In that case, the default PresAgent sequencing of Ads is a best estimate of optimal placement. 
When there is enough time for calculations, the PresAgent can query the BMQengine for user Ad sequencing preter- 
ms ences. 

[0263] The procedure to sequence Ads is the same as that for programming with the following Ad specific definitions: 

1 . title is the product's UPC or Ad sponsor's name 

2. genre is the sponsoring company's main SIC 

40 3. The semaphore Ad_nuH replaces nu//as a query wildcard to search only Ads. 

[0254] Several exemplary queries follow that demonstrate a range of Ad sequencing contextual placement capabil- 
ities. 

^5 Example A: 

[0265] Find the top three products (UPC) liked at night during Seinfeld: 

Query: [QueryFunctlon = top_n=3. StateType = title. fromStatelD= Ad_null. toStatelD= 'Seinfeld', TimeType = 

TOD, TimeValue = night] 

so 

Example B: 

[0266] Find the top 5 programs liked on Sunday after a Pepsi commercial: 

Query: [Ouefyrunction =r top_n=5, StateType = title, fromStatelD- Pepsi.UPC, toStatelD= null, TimeType = 

55 DOW, TimeValue = Sunday] 
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-xamole C: 



[0267] Find :he most liked genre at any tinre before an autc pans commercial:^^ 
Oiery- rOdery^jnctior. = nostLiKely. StateType = LiKodGenre. fronSlateiD= 



nJl.. toStateiD= Auto^arts_SIC 



5 TimeType = nonTemporal. imeValue = null] 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



Example D: 

r0268] Of the users liked Ads. find the top 3 Ad product categores dunng a sports program on Sundays: 

^ Query: [QueryFunction = top.n=5. StateType = LikedGenre. fromStatelD= sports. toStatelD= Ad.null TimeType 

auide VEPG can be established, we move to cescribe a specific implementation of an exemplary VEPG The P esAgent 
buldsrdefauU^ 

a i program preterenies ot the user before receiving new user input of actual chcces. ^^^/^^^^^^^^^ 
and abiL when time permits, to rebuild the default VEPG in real-time based on user content selection pattems. The 
comSorofa 'bost'educated guess' default content placement, and real-time context sensitive recalculation pro- 
vides for a robust, and optimal user preference estimation. 

p7o! I [ypica, coarse VEPG generated by the PresAgent from the exannple set of stored programs appears as. 



Night 



Seinfeld 
The 
Tonight 
Show 



Seinfeld 
The 
Tonight 
Show 



Seinfeld 
The Tonight 
Show 



Saturday 




Seinfeld 
The 
Tonight 
Show 



Seinfeld 
The Tonight 
Show 



r0271l This program placement could arise from the following scenario of system-detected, user contextual p ef er- 
ences The TemSnato? program requires a long attention span, and although weekday nights qualify equa^^y wth 
Saturday nlqhtThe liking of movies largely occurs in the latter time slot. Suppose, the user has a strong general 
prier^Tce tf ^^^^ CnIc. and financill programs during weekday late attemoons, after coming home from work^ 
The pTesAqenrrurthemiore, could detect that after the TASAgent recorded "Market Wrap' a few times, the "^er watched 
rduLg tSI period, and places it accordingly. However, if the PresAgent detects a stronger bias to watc'. Monday 
niaht -I^FL footbar on, say. Tuesday late afternoons instead, with no financial programming ^''^'^^^^^•^^^''J^^^ 
f7oL« game takes the latter, more' preferred, slot. The user might similarly show a P-^^^-;- '^f^^^^^^^ 
Live- but on Sunday nights instead. A repeatable pattern typically could be starting a weekday n,ght session wrth 
•Seinfeld' and a strong tendency to watch, and end the session with, The Tonight Show' thereafter. 
[0272? If the PresAgent detects a low (high) state diversity across a temporal context, such as daily time slot, it prefe^ 
rdeclasfoncrase) the variety of prigrlms in that time period. In the present example, the user has ^ d'versity 
measure In the late aLnoon and n/ghftlme blocks, but a much higher measure dunng the evenmg time penod. n the 
Absence of requential or temporal bias, the PresAgent can use diversity, or curiosity inf om,ation to d.stnbute prefer ed 
prograr^^^^^^^ appropriately In this case, the PresAgent is aware that the even/np time slot is P°P"'«^ ^"f 
usef hi a sLilar preference for Sci-Fi, comedies, and movies. However, if the user's diversrty measures are higher 
n this perioTtren'the PresAgent will avoid filling the daily slot with onty the most preferred V P-gram ^pe say 
^StarTrek' and instead distribute the available slots with a variety of short, liked programming. If NOVA has a little 
ifked'T ?n9 in the past, a high curiosity valuation in the evening slots would motivate the PresAgent to •nse'tJ^OVA 
Into the lineup Impotantly the user woulc find a VEPG that reflects their 'pHme t,me' as mght (9- . 2pm) instead of the 
SJil^nrSprnToom. and viewing pattern that matched their highly reoeatable oehav.ors, with the periodic exceptions 
that arise and fills their more exploratory, if any. periods with the range of programs that they might like. 
0273 Overtime, the system detects highfy repeatable preference pattems. as well as '-P°«-"^^--P^'°"^ J/^f. 
sSon of and learning from, stored program usage pattems continually teaches the system when, and m wha. 
sequence program categories are preferred. A parallel description applies to optimal Ad placement. 
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Ciaims 

1 . A television rating sysiem for targeted program delivery, connprislng: 

a clustering engine receiving television viewing data input, processing the viewing data input, and generating 
user profiles targeting advertising category' groups; 

a client-side systenn adapted to classify a television user into at least one advertising category group; 

a conte)crual behavioral profiling system connected to said client-side system and determining a television 
user's viewing behavior with content and usage-related preferences; and 

a behavioral model database connected to said profiling system and storing therein Information with the tele- 
vision user's viewing behavior. 

2. The system according to claim 1 , wherein said clustering engine is a software agent residing in a centra! computer 
system at a television distribution head-end and is programmed to create template behavioral profiles correspond- 
ing to targeted advertising categories of television viewers. 

3. The system according to claim 2, wherein said clustering engine is trained substantially exclusively on tagged 
viewing data from a given target group to learn a most general profile of the given target group. 

4. The system according to claim 2, wherein said clustering engine is programmed to generalize viewer's profiles in 
each group into a representative aggregation for a respective advertising category, and to form advertising category 
profiles by aggregating all dimensions most strongly in common for the given group and most unique across target 
groups. 

5. The system according to claim 1 , which further comprises an advertisement manager connected to query said 
behavioral model database^ said advertisement manager being programmed to parameterize behavioral profiles 
of said behavioral model database and to download the parameterized behavioral profiles to an advertising cate- 
gory membership agent residing at said client-side system. 

6. The system according to claim 5, wherein said advertising category membership agent Is configured to reconstruct 
the downloaded parameterized targeting models, and apply a clustering engine to the television user's history to 
determine a most likely advertising category the user belongs to and store the results as targeting category prob- 
abilities in a user category database. 

7. The system according to claim 5, which further comprises targeting agents and presentation agents disposed at 
said client-side system for combining the targeting category probabilities and relevant preference infomnation to 
selectively capture, store, and display advertisements downloaded in accordance with the optimization. 

8. In an Interactive display system with a head-end side distributing program content and a client side receiving the 
program content and selectively displaying the program content in accordance with a user's selection, a preference 
engine for determining the user's preferred program content, comprising: 

a user monitoring device connected at the client side to record contextual transition behaviors profiling one or 
more users and to continually build a knowledgebase of preferences and contextual transition behaviors pro- 
filing the one or more users; and 

a device for providing to the one or more users the program content in accordance with the user's demographic 
information and with the contextual transition behavior profile. 

9. The preference engine according to claim 8, wherein said user monitoring device models the user's behavioral 
interaction vvith advertising program content and with e.ntertainment program content. 

10. The preference engine according to claim 8, connected lo receive from the head-end metadata describing adver- 
tising content and metadata describing entertainment program content, and programmed to establish content pref- 
erences by combining metadata information with the contextual transition behavior profile, and to build a relational 
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knowledge b«se whh P.ssoc;«tior,s cetween "he user's behavior demcgr«pr.,cs ar.d prog-a'P content preferences. 

1 1 -he oreferen-e engme according to clam 8 programmed to model patterns o- usage behaviors w.th a behavioral 
nodranTto ex ra^^key usage Tnfomation from the benavioral node! nto a behavioral database, wherem e.ch 
entn, In the be^^^^^^^^^ database has a conf-denoe value associated therewith reflects ar. est,mate o^ a structural 
and samp.ing quali:y of the data used to calculate the database entr^'. 

12. in a program content del,ve.y system having a heac-end side and a cllert s.de. a systen. for targeted program 
celivery. comprising: 

a central data system at the head-end side receiving viewing data selected from group consis^^^^^ of vvatch 
data watch start time, watch duration, and watch channel, demographic information describing a program 
user! and an electronic program guide with metadata describing a program content; 

a oemographic cluster knowledge base acquirer receiving behavioral data of the user °"^P""L^^^^^^^^ 
edge base in form of a transition matrix with weight sets, the transition matnx predicting a demographic group 

of the user; and 

a program content generating module providing to the client side streams of program content including ad- 
vertisements based on the predicted demographic group of the user. 

1 3 The system according to claim 1 2, which further comprises a realtime feedback link for delivering to said central 

■ data system realtime Information concerning a user's viewing behavior with click stream data. 

14. The system according to claim 12, wherein said demographic cluster knowledge base acquirer is based on a 
hidden Markov model. 

15 The system according to claim 12, wherein said demographic cluster knowledge base acquirer ^^^^^^'^.l^^'^Z 
coZt generating module are software modules each adapted to be stored on a machine-readable medium in 
the form of a plurality of processor-executable instructions. 

16 The system according to claim 12. wherein said demographic cluster knowledge base acquirer generates demo- 

■ graphic cluster Information of the user In terms of statistical state machine transition models. 

1 7. The system according to claim 16. wherein the state machines are defined In the transition matrix, and the transition 
matrix contains information of program transitions initiated by the viewer. 

1 8. The system according to claim 1 2, wherein the transition matrix is one of at least two concurrent transition matrices 
including a channel matrix and a genre matrix. 

19. The system according to claim 12, wherein the transition matrix is a two-dimensional matrix with transitions from 
television channels to television channels in temporal form. 

20 The svstem according to claim 12. wherein said demographic cluster knowledge base acquirer is conf igured to 
p?aSer7ze 'he u e?s behavior with a double random pseudo hidden Markov process, and to de^ne a low-level 
staS state machine modeling a behavioral cluster and a top-level statistical state machine with active behav- 
ioral clusters and an interaction between the active behavioral clusters. 

21 The svstem according to claim 12, wherein said demographic cluster knowledge base acquirer is configured to 
Sn^r^ub's °an^^^^^ process with a plurality of dimensions, and to determine parallel statistical state machine 
trSion eveiVfn at leas^ two of three state categories including channel, genre, and title of the program content. 

22. A method of determining a television viewer's viewing habits, the method which comprises; 

recording a viewer's monitor behavior with data Item variables selected from the group consisting of watch 
date, watch start time, watch duration, and watch channel: 

inputting historical data information regarding demographic information tagged to the viewer; 
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inputting prograrn guide information: snd 

associating the progrann guide inforrriation with the viewer's nncnitor behavior and defining therefrom a knowl- 
edge base with demographic cluster information of the viewer in terms of statistical state machine transition 
5 models. 

23. The method according to claim 22, wherein the siep of defining the knowledge base comprises calculating a pa- 
rameterized transition matrix defining the viewer's viewing habits., the transition miatrix containing information of 
program transitions initiated by the viewer. 

10 

24. The method according to claim 23, which comprises defining at least two concurrent transition matrices including 
a channel matrix and a genre matrix. 

25. The method according to claim 23, which comprises defining the transition matrix as a two-dimensional matrix with 
15 transitions from television channels to television channels in temporal fonn. 

26. The method according to claim 22, which comprises providing feedback information with the viewer's monitor 
behavior by recording a click stream. 

20 27. The method according to claim 22, which comprises parameterizing the viewer's monitor behavior with a double 
random pseudo hidden Markov process, and defining a low-level statistical state machine modeling a behavioral 
cluster and a top-level statistical state machine with active behavioral clusters and an Interaction between the 
active behavioral clusters. 

25 28. The method according to claim 27, which comprises defining the double random process with a plurality of dimen- 
sions, and detennining parallel statistical state machine transition events in at least two of three state categories 
including channel, genre, and title. 

29. A machine-readable medium having stored thereon a plurality of processor-executable instructions for implement- 
30 ing a function of: 

capturing state transitions by defining monitor behavior in a plurality of statistical state machine families each 
representing a given viewer or demographic group viewing behavior; 

35 combining the statistical state machine families into global statistical state machines defined in a global prob- 

ability density function; 

updating and reinforcing the global probability density function upon determining that a given probability func- 
tion has a higher confidence level than a previous probability density function; and 

40 

outputtihg a global profile based on the global probability density function, wherein the global profile is suitable 
for determining programming content of a television server. 

30. The machine readable medium according to claim 29, wherein the state transitions represent a television viewer's 
45 monitor behavior and the statistical state machines are selected from the group consisting of watch date, watch 

start time, watch duration, and watch channel. 

31. The machine readable medium according to claim 29^ wherein the global profile represents demographic cluster 
information of the viewer in terms of the statistical state machine transition models. 

50 

32. The machine readable medium according to claim 29, wherein the state machines are defined in a parameterized 
transition matrix defining the viewer's viewing habits, the transition matrix containing information of program tran- 
sitions initiated by the viewer. 

55 33. The machine readable medium according to claim 32, wherein the transition miatrix is one of at least two concurrent 
transition matrices including a channel matrix and a genre matrix. 

34. The machine readable medium according to claim 29. wherein the transition matrix is a two-dimensional matrix 



37 



BNSDOCIO: 1223757A2_'_> 



EP 1 223 757 A2 

wilh trf.nsiiions fro*-, television channe's to television channeis in temporril 'orm. 

35. The machine readable medijrr according to claim 29, con-'igured to parameterize the viewer's noniio^ tehavior 
v/ith a double random sseudo h dden Markov process, and defining a iow-ievei siatistica; state macnine Todelir.g 

5 a behavioral cluster and a top-leve! statistical state machine with active Dehaviorai clusters ana an interacticn 

between tne actve beiaviorai clusters. 

36. The machine readable medium according to ciainr 29. which comprises defining the dcubie random process with 
a piura:ity of dimensions, and determining parallel statistical state machine iransitton events in at least two of three 

w state categories including chamel. genre, and title. 
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